# Pandas


### 1. The Pandas Series 

#### Creating Pandas Series

You can create pandas series from array-like objects using ```pd.Series()```.

In [None]:
# import pandas, pd is an alias
import pandas as pd
import numpy as np

# Creating a numeric pandas series
num_series = pd.Series([1,3,5,7,8])
print(num_series)
print(type(num_series))

Note that each element in the Series has an index, and the index starts at 0 as usual.

In [None]:
# creating a series of characters
# notice that the 'dtype' here is 'object'
char_series = pd.Series(['hi', 'n', 'aw'])
char_series

In [None]:
# creating a series of type datetime
date_series = pd.date_range(start = '20-01-2019', end = '30-01-2019')
date_series

In [None]:
type(date_series)

#### Indexing Series

Indexing series is exactly same as 1-D numpy arrays - index starts at 0.

In [None]:
# Indexing pandas series: Same as indexing 1-d numpy arrays or lists
# accessing the third element
num_series[2]

# accessing elements starting index = 3 till the end
num_series[3:]

In [None]:
# accessing the second and the fourth elements
# note that num_series[1, 3] will not work, you need to pass the indices [1, 3] as a list inside the original []
num_series[[1, 3]]

#### Explicitly specifying indices

You might have noticed that while creating a series, Pandas automatically indexes it from 0 to (n-1), n being the number of rows. But if we want, we can also explicitly set the index ourselves, using the ‘index’ argument while creating the series using `pd.Series()`

In [None]:
# Indexing explicitly
pd.Series([1, 2, 3], index = ['a', 'b', 'c'])

In [None]:
# You can also give the index as a sequence or use functions to specify the index
# But always make sure that the number of elements in the index list is equal to the number of elements specified in the series

pd.Series(np.array(range(1,10))**2, index = range(1,10))

Usually, you will work with Series only as a part of dataframes. Let's study the basics of dataframes.

### The Pandas Dataframe

#### Creating dataframes from dictionaries

There are various ways of creating dataframes, such as creating them from dictionaries, JSON objects, reading from txt, CSV files, etc. 

In [None]:
# keys become column names
df = pd.DataFrame({'name': ['Chetan', 'Vishal', 'Manoj', 'Akhilesh'], 
                   'age': [40, 43, 35, 37], 
                    'occupation': ['writer', 'composer', 'actor', 'politican']})
df

#### Importing CSV data files as pandas dataframes 

In [None]:
# reading a CSV file as a dataframe
market_df = pd.read_csv("global_sales_data/market_fact.csv")

Usually, dataframes are imported as CSV files, but sometimes it is more convenient to convert dictionaries 
into dataframes. For e.g. when the raw data is in a JSON format (which is not uncommon), you can easily convert it into a dictionary, and then into a dataframe. 

You will learn how to convert JSON objects to dataframes later.

#### Importing Excel data files as pandas dataframes

In [None]:
df = pd.read_excel("iris.xls")

#### Reading and Summarising Dataframes

After you import a dataframe, you'd want to quickly understand its structure, shape, meanings of rows and columns etc. Further, you may want to look at summary statistics - such as mean, percentiles etc.

In [None]:
# Looking at the top and bottom entries of dataframes
df.head()

In [None]:
df.tail()

Here, each row represents an order placed at a retail store. Notice the index associated with each row - starts at 0 and ends at 149, implying that there were 150 orders placed.

In [None]:
# Looking at the top and bottom entries of dataframes
market_df.head()

In [None]:
market_df.tail()

Here, each row represents an order placed at a retail store. Notice the index associated with each row - starts at 0 and ends at 8398, implying that there were 8399 orders placed.

In [None]:
# Looking at the datatypes of each column
df.info()

In [None]:
# Looking at the datatypes of each column
market_df.info()

In [None]:
# Describe gives you a summary of all the numeric columns in the dataset
df.describe()

In [None]:
# Describe gives you a summary of all the numeric columns in the dataset
market_df.describe()

In [None]:
# Column names
df.columns

In [None]:
# Column names
market_df.columns

In [None]:
# The number of rows and columns
df.shape

In [None]:
# The number of rows and columns
market_df.shape

In [None]:
# You can extract the values of a dataframe as a numpy array using df.values 
df.values

In [None]:
# You can extract the values of a dataframe as a numpy array using df.values 
market_df.values

#### Indices 

An important concept in pandas dataframes is that of *row indices*. By default, each row is assigned indices starting from 0, and are represented at the left side of the dataframe. 

#### Sorting dataframes

You can sort dataframes in two ways - 1) by the indices and 2) by the values.  
