 ## Introduction
 
    * Pandas is a library providing data structures and data analysis tools within Python.
    * Pandas allows you to load data from different sources into Python and then use Python code to analyse those data and produce data which can be in the form of tables, text and visualization.
    * Useful in processing data constructed of rows and columns.
    * Data Structure used in Pandas is called Data Frames.

In [1]:
import pandas

In [2]:
# Creating a data frame manually
df1 = pandas.DataFrame([[2,4,6],[10,20,30]])

In [3]:
df1

Unnamed: 0,0,1,2
0,2,4,6
1,10,20,30


In [4]:
# Giving some name to the columns in the Data Frame
df1 = pandas.DataFrame([[2,4,6],[10,20,30]],columns = ["Price","Age","Values"])

In [5]:
df1

Unnamed: 0,Price,Age,Values
0,2,4,6
1,10,20,30


In [6]:
# Giving some name to the rows in the Data Frame
df1 = pandas.DataFrame([[2,4,6],[10,20,30]], columns = ["Price","Age","Values"], index = ["First","Second"])

In [7]:
df1

Unnamed: 0,Price,Age,Values
First,2,4,6
Second,10,20,30


In [8]:
# Another way of creating manual Data Frames
df2 = pandas.DataFrame([{"Name":"John"},{"Name":"Jack"}])
df2

Unnamed: 0,Name
0,John
1,Jack


In [9]:
# Two key-value pair data in first row but only one in second. And hence in the output we can see NaN for the second row
df2 = pandas.DataFrame([{"Name":"John","Surname":"Jones"},{"Name":"Jack"}])
df2

Unnamed: 0,Name,Surname
0,John,Jones
1,Jack,


In [10]:
type(df1)

pandas.core.frame.DataFrame

In [11]:
# To find the mean of all the columns in the Data Frame
df1.mean()

Price      6.0
Age       12.0
Values    18.0
dtype: float64

In [12]:
# To find the mean of the entire Data Frame
df1.mean().mean()

12.0

In [13]:
type(df1.mean())

pandas.core.series.Series

In [14]:
df1.Price

First      2
Second    10
Name: Price, dtype: int64

In [15]:
type(df1.Price)

pandas.core.series.Series

A data Frame is made up of Series.

In [16]:
df1.Price.mean()

6.0

In [17]:
df1.Price.max()

10

In [18]:
import os
os.listdir()

['.ipynb_checkpoints',
 'PandasLibrary.ipynb',
 'supermarkets-commas.txt',
 'supermarkets-semi-colons.txt',
 'supermarkets.csv',
 'supermarkets.json',
 'supermarkets.xlsx']

In [19]:
# The function read_csv has a parameter 'header' which is set to true by default. This will consider the first row of the csv as the header of the table. 
# In case we don't want that, we can set it to False and it will treat the first row as normal data.
df_csv = pandas.read_csv("supermarkets.csv")
df_csv.set_index("ID")

Unnamed: 0_level_0,Address,City,State,Country,Name,Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [20]:
df_csv.shape

(6, 7)

In [21]:
df_json = pandas.read_json("supermarkets.json")
df_json.set_index("ID")

Unnamed: 0_level_0,Address,City,Country,Employees,Name,State
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,USA,8,Madeira,CA 94114
2,735 Dolores St,San Francisco,USA,15,Bready Shop,CA 94119
3,332 Hill St,San Francisco,USA,25,Super River,California 94114
4,3995 23rd St,San Francisco,USA,10,Ben's Shop,CA 94114
5,1056 Sanchez St,San Francisco,USA,12,Sanchez,California
6,551 Alvarado St,San Francisco,USA,20,Richvalley,CA 94114


In [22]:
df_xlxs = pandas.read_excel("supermarkets.xlsx", sheet_name=0)
df_xlxs.set_index("ID")

Unnamed: 0_level_0,Address,City,State,Country,Supermarket Name,Number of Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [23]:
df_commas_text = pandas.read_csv("supermarkets-commas.txt")
df_commas_text.set_index("ID")

Unnamed: 0_level_0,Address,City,State,Country,Name,Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [24]:
df_semi_colons_txt = pandas.read_csv("supermarkets-semi-colons.txt",sep=";")
df_semi_colons_txt

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [25]:
df_url_csv = pandas.read_csv("http://pythonhow.com/supermarkets.csv")
df_url_csv

Unnamed: 0,ID,Address,City,State,Country,Name,Employees
0,1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
1,2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
2,3,332 Hill St,San Francisco,California 94114,USA,Super River,25
3,4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
4,5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
5,6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20
