In [1]:
import pandas as pd 

In [4]:
cars = pd.read_csv('data/cars.csv')

In [6]:
people = pd.read_csv('data/people.csv')

In [5]:
cars.head()

Unnamed: 0,Name,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Origin
0,Chevrolet Chevelle Malibu,18.0,8,307.0,130,3504,12.0,US
1,Buick Skylark 320,15.0,8,350.0,165,3693,11.5,US
2,Plymouth Satellite,18.0,8,318.0,150,3436,11.0,US
3,AMC Rebel SST,16.0,8,304.0,150,3433,12.0,US
4,Ford Torino,17.0,8,302.0,140,3449,10.5,US


In [7]:
people.head()

Unnamed: 0,Name,Age,Weight,Height,Gender
0,Rita,27,67,1.65,F
1,Dexter,35,81,1.84,M
2,Anna,29,55,1.6,F
3,Bob,41,73,1.79,M


We can access a specific cell of a pandas dataframe using the DataFrame.iloc property by providing the row and column index. The name **iloc stands for integer location**

When we really only want to access a single location, as in the first example, it is recommended to use the **DataFrame.iat** property. It has the same syntax but doesn't allow ranges.

In [8]:
print(people.iat[0, 0])

Rita


In [9]:
cars_odd = cars.iloc[1::2,:3]
fifth_odd_car_name = cars_odd.iat[4,0]
last_four = cars_odd.tail(4)
print(last_four)

                  Name   MPG  Cylinders
399  Dodge Charger 2.2  36.0          4
401    Ford Mustang GL  27.0          4
403      Dodge Rampage  32.0          4
405         Chevy S-10  31.0          4


To access the data with row and column labels instead of indexes, we can use the **DataFrame.loc** property.

In [10]:
print(people.loc[1, 'Name'])

Dexter


By default, when we read a CSV, pandas will use the row indexes as row labels. If we want something else, we need to say it explicitly. We could have specified through the index_col keyword argument in the pandas.read_csv() function. The way it works is that we pass the index of the columns that we want to use as labels for the rows.

Here's how we could use the Name column (index 0) as row labels when we read the CSV:

In [12]:
people = pd.read_csv('data/people.csv', index_col=0)

In [13]:
people.head()

Unnamed: 0_level_0,Age,Weight,Height,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Rita,27,67,1.65,F
Dexter,35,81,1.84,M
Anna,29,55,1.6,F
Bob,41,73,1.79,M


In [15]:
people1 = pd.read_csv('data/people.csv')
people1.head()

Unnamed: 0,Name,Age,Weight,Height,Gender
0,Rita,27,67,1.65,F
1,Dexter,35,81,1.84,M
2,Anna,29,55,1.6,F
3,Bob,41,73,1.79,M


We can also change the index after loading the dataframe using the **DataFrame.set_index()** method. By default, this method will return a copy of the dataframe with the new index. If you don't want a copy but rather modify the index, you need to use the **inplace** keyword argument set to **True**.

In [18]:
people_indexed_on_name = people1.set_index('Name') # Get new dataframe
people1.set_index('Name', inplace=True)            # Change the people dataframe directly

In [19]:
cars.set_index('Name', inplace=True)
weight_torino = cars.loc['Ford Torino', 'Weight']

In [20]:
weight_torino

3449

When we **convert a column into an index, that column is no longer a column in our dataframe**. For example, on the previous screen, we set the Name column as the index of the people dataframe. This means that now this dataframe has four columns Age, Weight, Height, and Gender.

In [21]:
print(people1.shape)

(4, 4)
