DataFrame is a 2D series object, where theres an index and multiple columns of content, with each column having a label.

In [29]:
import pandas as pd

In [30]:
# create 3 school records for students and their class grades

record1 = pd.Series({'Name': 'Alice',
                    'Class': 'Physics',
                    'Score': 85})

record2 = pd.Series({'Name': 'Jack',
                    'Class': 'Chemistry',
                    'Score': 82})

record3 = pd.Series({'Name': 'Helen',
                    'Class': 'Biology',
                    'Score': 90})

Like a Series, the DataFrame object is index.

Here, I'll use a group of series, where each series represents a row of data.

Just like the Series function, we can pass in our individual items in an array, and we can pass in our index values as a second arguments.

In [31]:
df = pd.DataFrame([record1, record2, record3],
                 index = ['school1', 'school2', 'school1'])

In [32]:
df.head()

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


You can also use a list of dict, where each dict represents a row of data.

In [33]:
students = [{'Name': 'Alice',
            'Class': 'Physics',
            'Score': 85},
            {'Name': 'Jack',
            'Class': 'Chemistry',
            'Score': 82},
            {'Name': 'Helen',
            'Class': 'Biology',
            'Score': 90}]

# then pass this list of dict into the DataFrame function
df = pd.DataFrame(students, index=['school1', 'school2', 'school1'])

df.head()

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


Similar to Series, we can extract data using `.iloc` and `.loc` attributes.
Because DF is 2D, passing a single vlaue to loc indexing operator will return the series if theres only 1 row to return.


In [34]:
df.loc['school2']

Name          Jack
Class    Chemistry
Score           82
Name: school2, dtype: object

In [35]:
type(df.loc['school2'])

pandas.core.series.Series

Indices and col names along eith axes horizontal or vertical could be non-unique.
For e.g., we see 2 records for school1 as different rows.
If we use a single value with the DataFrame lock attribute, multiple rows of the DataFrame will return, not as a new Series, but as a new DataFrame.


In [36]:
df.loc['school1']

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school1,Helen,Biology,90


In [37]:
type(df.loc['school1'])

pandas.core.frame.DataFrame

### Select data based on multiple axes.

In [38]:
# find school1's student names
df.loc['school1', 'Name']

school1    Alice
school1    Helen
Name: Name, dtype: object

To select a single column, we can transpose the matrix. This is done using `.T` attribute

In [39]:
df.T

Unnamed: 0,school1,school2,school1.1
Name,Alice,Jack,Helen
Class,Physics,Chemistry,Biology
Score,85,82,90


In [40]:
# then call .loc on transpose to get student names only.
df.T.loc['Name']

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object

In [41]:
df['Name']

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object

In [42]:
# Cannot use .loc on col name
#df.loc['Name']

In [43]:
# type of single col projection is a Series object
type(df['Name'])

pandas.core.series.Series

In [44]:
# chain operatons to select rows = school1, and using .loc to project the name col from those rows
df.loc['school1']['Name']

school1    Alice
school1    Helen
Name: Name, dtype: object

In [45]:
print(type(df.loc['school1'])) # DataFrame
print(type(df.loc['school1']['Name'])) # Series

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


### `.loc` supports slicing

In [46]:
# Find all the name and scores for all schools using the .loc operator
# param1 is : which means all rows/schools
# param2 is a list of the cols we are interested in.
df.loc[:,['Name', 'Score']]

Unnamed: 0,Name,Score
school1,Alice,85
school2,Jack,82
school1,Helen,90


### 2 Methods to drop data

#### Method 1: Use the drop function.

It does not change the DataFrame by default, it only returns to you a copy of the DF.

In [47]:
df.drop('school1')

Unnamed: 0,Name,Class,Score
school2,Jack,Chemistry,82


In [48]:
df

Unnamed: 0,Name,Class,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


Drop has 2 optional params.

Param1 is inplace, and if its set to `True`, the DataFrame will be updated in place, isntead of a copy being returned.

Param2 is the axes, which should be dropped. By default, this value is 0, indicating the row axis.

You can change to 1 if you want to drop a column.


In [49]:
# make a copy of DF using .copy()
copy_df = df.copy()

# drop the name col in copy
copy_df.drop("Name", inplace=True, axis=1)

copy_df

Unnamed: 0,Class,Score
school1,Physics,85
school2,Chemistry,82
school1,Biology,90


#### Method 2: Use indexing operator with `del` keyword

It takes immediate effect on the DF.

In [50]:
del copy_df['Class']
copy_df

Unnamed: 0,Score
school1,85
school2,82
school1,90


### Adding a new col to DF using indexing operator with default value None

In [54]:
df['ClassRanking'] = None
df

Unnamed: 0,Name,Class,Score,ClassRanking
school1,Alice,Physics,85,
school2,Jack,Chemistry,82,
school1,Helen,Biology,90,
