# DataFrame Data Structure

In [1]:
import pandas as pd

In [5]:
record1 = pd.Series({'Name':'Alice','Subject':'Math','Score':78})
record2 = pd.Series({'Name':'Sam','Subject':'EVS','Score':98})
record3 = pd.Series({'Name':'Jack','Subject':'Arts','Score':67})
df = pd.DataFrame([record1, record2, record3], 
                  index = ['student1','student2','student1'])
df

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student2,Sam,EVS,98
student1,Jack,Arts,67


In [6]:
df.loc['student2']

Name       Sam
Subject    EVS
Score       98
Name: student2, dtype: object

In [7]:
df.loc['student1']

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student1,Jack,Arts,67


In [8]:
type(df.loc['student1'])

pandas.core.frame.DataFrame

In [9]:
type(df.loc['student2'])

pandas.core.series.Series

In [14]:
df['Name']  # analogus to cloumn projection in database

student1    Alice
student2      Sam
student1     Jack
Name: Name, dtype: object

In [15]:
type(df['Name'])

pandas.core.series.Series

In [16]:
df

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student2,Sam,EVS,98
student1,Jack,Arts,67


As shown in below cell chaining, by indexing on the return type of another index, can come with some costs and is best avoided if you can use another approach. In particular, chaining tends to cause Pandas to return a copy of the DataFrame instead of a view on the DataFrame. For selecting data, this is not a big deal, though it might be slower than necessary. If you are chaining data though this is an important distinction and can be a source of error. Hence consider this issue of chaining carefully, and try to avoid it, as it can cause unpredictable results, where your intent was to obtain a view of the data, but instead Pandas returns to you a copy.

In [22]:
df.loc['student1']['Name']

student1    Alice
student1     Jack
Name: Name, dtype: object

In [23]:
df.loc['student1','Name']

student1    Alice
student1     Jack
Name: Name, dtype: object

In [30]:
print(type(df.loc['student1','Name']))

<class 'pandas.core.series.Series'>


In [28]:
df.iloc[0]

Name       Alice
Subject     Math
Score         78
Name: student1, dtype: object

#### Note - In DataFrame to access a row you need to compulsorily use loc or iloc. While in Series its fine and no compulsion. Also, if you try to use loc or iloc on column it will give you key error.

In [33]:
df.loc[:,['Name','Score']]

Unnamed: 0,Name,Score
student1,Alice,78
student2,Sam,98
student1,Jack,67


Drop fuction - Note by default it doesn't delete and return's copy of series/dataframe without that row/column. Incase you want to make changes to orginal series/dataframe you can set inpalce option to True

In [36]:
df.drop('student2')

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student1,Jack,Arts,67


In [37]:
df

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student2,Sam,EVS,98
student1,Jack,Arts,67


In [38]:
df.drop('student1')  # it will look into index array

Unnamed: 0,Name,Subject,Score
student2,Sam,EVS,98


In [40]:
df.drop('Subject',axis=1)

Unnamed: 0,Name,Score
student1,Alice,78
student2,Sam,98
student1,Jack,67


In [41]:
df.drop(columns=['Name','Score'])

Unnamed: 0,Subject
student1,Math
student2,EVS
student1,Arts


In [45]:
copy_df = df.copy()
copy_df

Unnamed: 0,Name,Subject,Score
student1,Alice,Math,78
student2,Sam,EVS,98
student1,Jack,Arts,67


In [46]:
copy_df.drop('Subject', inplace = True, axis = 1) # won't return anything
copy_df

Unnamed: 0,Name,Score
student1,Alice,78
student2,Sam,98
student1,Jack,67


if you want to delete column directly you can use del keyword

In [52]:
del copy_df['Score']
copy_df

Unnamed: 0,Name
student1,Alice
student2,Sam
student1,Jack


In [55]:
copy_df['Rank'] = None
copy_df

Unnamed: 0,Name,Rank
student1,Alice,
student2,Sam,
student1,Jack,
