Pandas has two main data structures: 

Series
Dataframes 
The more commonly used data structure are DataFrames. So, most of this session will be focused on DataFrames. When you encounter series data structure, Behzad will explain them briefly to you. Let's begin the session by introducing Pandas DataFrames.  

DataFrame
It is a table with rows and columns, with rows having an index each and columns having meaningful names. There are various ways of creating dataframes, for instance, creating them from dictionaries, reading from .txt and .csv files. Let’s take a look at them one by one. 

Creating dataframes from dictionaries

If you have data in the form of lists present in Python, then you can create the dataframe directly through dictionaries. The ‘key’ in the dictionary acts as the column name and the ‘values’ stored are the entries under the column.

<h2 style = "color : Brown"> Data Frame </h2>

In [8]:
# All imports
import numpy as np
import pandas as pd

<h4 style = "color : Sky blue"> Example - 1</h4>  

##### Create a Data Frame cars using raw data stored in a dictionary


In [11]:
cars_per_cap = [809, 731, 588, 18, 200, 70, 45]
country = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
drives_right = [True, False, False, False, True, True, True]

In [13]:
data = {"cars_per_cap": cars_per_cap, "country": country, "drives_right": drives_right}

In [15]:
data

{'cars_per_cap': [809, 731, 588, 18, 200, 70, 45],
 'country': ['United States',
  'Australia',
  'Japan',
  'India',
  'Russia',
  'Morocco',
  'Egypt'],
 'drives_right': [True, False, False, False, True, True, True]}

In [17]:
cars = pd.DataFrame(data)

cars

Unnamed: 0,cars_per_cap,country,drives_right
0,809,United States,True
1,731,Australia,False
2,588,Japan,False
3,18,India,False
4,200,Russia,True
5,70,Morocco,True
6,45,Egypt,True


In [19]:
type(cars)

pandas.core.frame.DataFrame

In [None]:
# To create a dataframe from a dictionary, you can run the following command:

# pd.DataFrame(dictionary_name)


# You can also provide lists or arrays to create dataframes, but then you will have to specify the column names as shown below.

# pd.DataFrame(list_or_array_name, columns = ['column_1', 'column_2'])

<h4 style = "color : Sky blue"> Example - 2 (Reading data from a file)</h4>  

##### Create a Data Frame by importing cars data from cars.csv

In [26]:
# Read a file using pandas

cars_df = pd.read_csv('cars.csv')

cars_df

Unnamed: 0,USCA,US,United States,809,FALSE
0,ASPAC,AUS,Australia,731.0,True
1,ASPAC,JAP,Japan,588.0,True
2,ASPAC,IN,India,18.0,True
3,ASPAC,RU,Russia,200.0,False
4,LATAM,MOR,Morocco,70.0,False
5,AFR,EG,Egypt,45.0,False
6,EUR,ENG,England,,True


<h4 style = "color : Sky blue"> Example - 3 (Column headers)</h4>  

##### Read file - skip header

In [29]:
cars_df = pd.read_csv('cars.csv', header=None)

cars_df

Unnamed: 0,0,1,2,3,4
0,USCA,US,United States,809.0,False
1,ASPAC,AUS,Australia,731.0,True
2,ASPAC,JAP,Japan,588.0,True
3,ASPAC,IN,India,18.0,True
4,ASPAC,RU,Russia,200.0,False
5,LATAM,MOR,Morocco,70.0,False
6,AFR,EG,Egypt,45.0,False
7,EUR,ENG,England,,True


##### Assign Headers

In [34]:
# Returns an array of headers

cars_df.columns

Index([0, 1, 2, 3, 4], dtype='int64')

In [36]:
# Rename Headers

cars_df.columns = ['country code', 'region', 'country', 'cars_per_cap', 'drive_right']

In [38]:
cars_df

Unnamed: 0,country code,region,country,cars_per_cap,drive_right
0,USCA,US,United States,809.0,False
1,ASPAC,AUS,Australia,731.0,True
2,ASPAC,JAP,Japan,588.0,True
3,ASPAC,IN,India,18.0,True
4,ASPAC,RU,Russia,200.0,False
5,LATAM,MOR,Morocco,70.0,False
6,AFR,EG,Egypt,45.0,False
7,EUR,ENG,England,,True


<h4 style = "color : Sky blue"> Example - 4 (Row index/names) </h4>  

##### Read file - skip header and assign first column as index.

In [41]:
# Index is returned by
cars_df.index

RangeIndex(start=0, stop=8, step=1)

In [66]:
# Read file and set 1st column as index
cars_df = pd.read_csv("cars.csv", header= None, index_col=0)

cars_df

Unnamed: 0_level_0,1,2,3,4
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USCA,US,United States,809.0,False
ASPAC,AUS,Australia,731.0,True
ASPAC,JAP,Japan,588.0,True
ASPAC,IN,India,18.0,True
ASPAC,RU,Russia,200.0,False
LATAM,MOR,Morocco,70.0,False
AFR,EG,Egypt,45.0,False
EUR,ENG,England,,True


In [68]:
# set the column names
cars_df.columns = ['region', 'country', 'cars_per_cap', 'drive_right']
cars_df

Unnamed: 0_level_0,region,country,cars_per_cap,drive_right
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USCA,US,United States,809.0,False
ASPAC,AUS,Australia,731.0,True
ASPAC,JAP,Japan,588.0,True
ASPAC,IN,India,18.0,True
ASPAC,RU,Russia,200.0,False
LATAM,MOR,Morocco,70.0,False
AFR,EG,Egypt,45.0,False
EUR,ENG,England,,True


In [45]:
# Print the new index
cars_df.index


Index(['USCA', 'ASPAC', 'ASPAC', 'ASPAC', 'ASPAC', 'LATAM', 'AFR', 'EUR'], dtype='object', name=0)

In [70]:
cars_df

Unnamed: 0_level_0,region,country,cars_per_cap,drive_right
0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USCA,US,United States,809.0,False
ASPAC,AUS,Australia,731.0,True
ASPAC,JAP,Japan,588.0,True
ASPAC,IN,India,18.0,True
ASPAC,RU,Russia,200.0,False
LATAM,MOR,Morocco,70.0,False
AFR,EG,Egypt,45.0,False
EUR,ENG,England,,True


##### Rename the Index Name

In [48]:
cars_df.index.name = 'country_code'
cars_df

Unnamed: 0_level_0,region,country,cars_per_cap,drive_right
country_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
USCA,US,United States,809.0,False
ASPAC,AUS,Australia,731.0,True
ASPAC,JAP,Japan,588.0,True
ASPAC,IN,India,18.0,True
ASPAC,RU,Russia,200.0,False
LATAM,MOR,Morocco,70.0,False
AFR,EG,Egypt,45.0,False
EUR,ENG,England,,True


##### Delete the index name

In [51]:
cars_df.index.name = None
cars_df

Unnamed: 0,region,country,cars_per_cap,drive_right
USCA,US,United States,809.0,False
ASPAC,AUS,Australia,731.0,True
ASPAC,JAP,Japan,588.0,True
ASPAC,IN,India,18.0,True
ASPAC,RU,Russia,200.0,False
LATAM,MOR,Morocco,70.0,False
AFR,EG,Egypt,45.0,False
EUR,ENG,England,,True


##### Set Hierarchical index
#### It is also possible to create a multilevel indexing for your dataframe; this is known as hierarchical indexing.

In [77]:
# Read file and set 1st column as index
cars_df = pd.read_csv("cars.csv", header= None)

# set the column names
cars_df.columns = ['country_code','region','country','cars_per_cap','drives_right']
# earlier name=0 as only 1 column is index
# we have 2 indexes here
cars_df.set_index(['region', 'country_code'], inplace=True)


In [56]:
cars_df

Unnamed: 0_level_0,Unnamed: 1_level_0,country,cars_per_cap,drives_right
region,country_code,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US,USCA,United States,809.0,False
AUS,ASPAC,Australia,731.0,True
JAP,ASPAC,Japan,588.0,True
IN,ASPAC,India,18.0,True
RU,ASPAC,Russia,200.0,False
MOR,LATAM,Morocco,70.0,False
EG,AFR,Egypt,45.0,False
ENG,EUR,England,,True


<h4 style = "color : Sky blue"> Example - 5 (Write Data Frame to file) </h4>  

##### Write cars_df to cars_to_csv.csv

In [60]:
cars_df.to_csv('cars_to_csv.csv')

pd.read_csv(filepath, sep=',', header='infer')

You can specify the following details

separator (by default ‘,’)
header (takes the top row by default, if not specified)
names (list of column name)

import numpy as np
import pandas as pd

# The file is stored at: 'https://kh-prod-codelabs.s3.ap-south-1.amazonaws.com/marks_1-afa65670e6f9462fa1595a67f18c80af.csv'
# Provide your answer below
data = input()
df = pd.read_csv('https://kh-prod-codelabs.s3.ap-south-1.amazonaws.com/marks_1-afa65670e6f9462fa1595a67f18c80af.csv', header = None, sep = '|')

print(df)

In [93]:
# import numpy as np
import pandas as pd

# The file is stored at: 'https://kh-prod-codelabs.s3.ap-south-1.amazonaws.com/marks_1-afa65670e6f9462fa1595a67f18c80af.csv'
# Provide your answer below

df1 = pd.read_csv('marks.csv', header = None, sep = '|')

print(df1)

     0        1            2   3   4   5
0    1   Akshay  Mathematics  50  40  80
1    2   Mahima      English  40  33  83
2    3    Vikas  Mathematics  50  42  84
3    4  Abhinav      English  40  31  78
4    5   Mahima      Science  50  40  80
5    6   Akshay      Science  50  49  98
6    7  Abhinav  Mathematics  50  47  94
7    8    Vikas      Science  50  40  80
8    9  Abhinav      Science  50  47  94
9   10    Vikas      English  40  39  98
10  11   Akshay      English  40  35  88
11  12   Mahima  Mathematics  50  43  86


In [87]:
df1.columns

Index([0, 1, 2, 3, 4, 5], dtype='int64')

In [97]:
df1.columns = ['S.No','Name', 'Subject', 'Maximum Marks', 'Marks Obtained','Percentage']
df1

Unnamed: 0,S.No,Name,Subject,Maximum Marks,Marks Obtained,Percentage
0,1,Akshay,Mathematics,50,40,80
1,2,Mahima,English,40,33,83
2,3,Vikas,Mathematics,50,42,84
3,4,Abhinav,English,40,31,78
4,5,Mahima,Science,50,40,80
5,6,Akshay,Science,50,49,98
6,7,Abhinav,Mathematics,50,47,94
7,8,Vikas,Science,50,40,80
8,9,Abhinav,Science,50,47,94
9,10,Vikas,English,40,39,98


In [99]:
print(df1.to_string(index=False))

 S.No    Name     Subject  Maximum Marks  Marks Obtained  Percentage
    1  Akshay Mathematics             50              40          80
    2  Mahima     English             40              33          83
    3   Vikas Mathematics             50              42          84
    4 Abhinav     English             40              31          78
    5  Mahima     Science             50              40          80
    6  Akshay     Science             50              49          98
    7 Abhinav Mathematics             50              47          94
    8   Vikas     Science             50              40          80
    9 Abhinav     Science             50              47          94
   10   Vikas     English             40              39          98
   11  Akshay     English             40              35          88
   12  Mahima Mathematics             50              43          86


In [101]:
import numpy as np
import pandas as pd

df1 = pd.read_csv('marks.csv', header = None, sep = '|',index_col=0)
# df = pd.read_csv(data, header = None, sep = '|', index_col = 0)
df1.columns = ['Name', 'Subject', 'Maximum Marks', 'Marks Obtained', 'Percentage']
df1.index.name = 'S.No.'

print(df1)

          Name      Subject  Maximum Marks  Marks Obtained  Percentage
S.No.                                                                 
1       Akshay  Mathematics             50              40          80
2       Mahima      English             40              33          83
3        Vikas  Mathematics             50              42          84
4      Abhinav      English             40              31          78
5       Mahima      Science             50              40          80
6       Akshay      Science             50              49          98
7      Abhinav  Mathematics             50              47          94
8        Vikas      Science             50              40          80
9      Abhinav      Science             50              47          94
10       Vikas      English             40              39          98
11      Akshay      English             40              35          88
12      Mahima  Mathematics             50              43          86
