# This jupyter notebook adjoins the datacamp pandas lesson
<hr>

When working with data, the form of this data can vary greatly, but pretty often, you can boil it down to a **tabular structure**.

<img src="Images/tabular_examples.jpg"> </img>

Every row is a measurement, or an observation, and for each observation there are different variables. A row is an observation while a column is a variable. 

For each measurement there's of course the temperature but also the date and time of the measurement, and the location. 

<hr>


To start working on this data in Python, you'll need some kind of rectangular data structure. 

You already know what a 2D Numpy array is, it is an option, but not necessarily the best one. Remember that a Numpy array can only house elements with the same data type. 

In the example data set we can observe that there are different data types. 

To handle multiple data types in a rectangular data structure we will use  **Pandas**. 

Pandas is a high leve l data manipulation tool developed by Wes McKinney, built on the Numpy package. 

Compared to Numpy, it's more high level, making it interesting for data scientists all over the world. 

In pandas, we store the tabular data in a rectangular table called a **DataFrame**. 

<img src="Images/dataframe.jpg"></img>

### Note:
That each row of the dataframe has a unique row label. 

The columns, or variables, also have labels. 

Notice that the values in the different columns have different data types; this is a very big difference with 2D Numpy arrays.

<hr>

## Creating A DataFrame From A Dictionary

Using the distinctive curly brackets, we create key value pairs. 

The keys are called labels, amd the values are the corresponding columns, in list form. 

After importing the pandas package 
``` import pandas as pd ``` 
you can create a DataFrame from the dictionary using pd (dot) DataFrame function. 

``` df = pd.DataFrame(dict_name)```

## Note: 
- The DataFrame function is capitalized
- The number of values for each key must have the same number of elements across all columns

In [1]:
import pandas as pd
someDict = {"Key1": ['value1', 'value2', 'value3'],
            "Key2": [1, 2, 3],
            "Key3": [4, 5, 6]}
df = pd.DataFrame(someDict)
print(df)

     Key1  Key2  Key3
0  value1     1     4
1  value2     2     5
2  value3     3     6


Pandas assigned some automatic row labels from 0 up to 2. 
To specify them manually, you can set the index attribute of df to a list with the correct labels. 

In [2]:
df.index = ["sample1", "sample2", "sample3"]
print(df)

           Key1  Key2  Key3
sample1  value1     1     4
sample2  value2     2     5
sample3  value3     3     6


<hr> 

## DataFrame from CSV file 

Most times you will not build these DataFrames manually, instead we will import csv files. **CSV** stands for comma-separated values. 

We will import this data into Python using Pandas read_csv() function 

You'll pass the path of the file as an argument.

In [3]:
path = "Weather.csv"
df = pd.read_csv(path)
df

Unnamed: 0,MONTH,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL,MLY-PRCP-75PCTL,MLY-TAVG-NORMAL
0,Jan,4.81,3.29,6.07,42.1
1,Feb,3.31,2.09,4.44,43.4
2,Mar,3.51,2.6,3.93,46.6
3,Apr,2.77,2.1,2.94,50.5
4,May,2.16,1.34,2.91,56.0
5,Jun,1.63,1.04,2.25,61.0
6,Jul,0.79,0.3,1.1,65.9
7,Aug,0.97,0.33,1.24,66.5
8,Sep,1.52,0.64,2.06,61.6
9,Oct,3.41,2.03,4.41,53.3


When we print out the DataFrame we can notice that there is a problem. 

The row labels are seen as a column in their own right. 

To solve this, we'll have to tell the read_csv() function that the first column contains the row indexes, using the following argument ```index_col=0```

In [4]:
df = pd.read_csv(path, index_col=0)
print(df)

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1
Feb               3.31             2.09             4.44             43.4
Mar               3.51             2.60             3.93             46.6
Apr               2.77             2.10             2.94             50.5
May               2.16             1.34             2.91             56.0
Jun               1.63             1.04             2.25             61.0
Jul               0.79             0.30             1.10             65.9
Aug               0.97             0.33             1.24             66.5
Sep               1.52             0.64             2.06             61.6
Oct               3.41             2.03             4.41             53.3
Nov               5.84             4.53             7.17             46.2
Dec               5.43             3.9

[Pandas read_csv() function documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

<hr> 

## Dictionary to DataFrame (1)
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

Three lists are defined in the script:

- names, containing the country names for which data is available.
- dr, a list with booleans that tells whether people drive left or right in the corresponding country.
- cpc, the number of motor vehicles per 1000 people in the corresponding country.
- Each dictionary key is a column label and each value is a list which contains the column elements.

## Instructions
- Import pandas as pd.
- Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs:
- key 'country' and value names.
- key 'drives_right' and value dr.
- key 'cars_per_cap' and value cpc.
- Use pd.DataFrame() to turn your dict into a DataFrame called cars.
- Print out cars and see how beautiful it is.

In [5]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd
import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country':names, 'drives_right':dr, 'cars_per_cap':cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


<hr>

## Dictionary to DataFrame (2)
The Python code that solves the previous exercise is included on the right. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

To solve this a list row_labels has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the index attribute of cars, that you can access as cars.index.

## Instructions
- Hit Submit Answer to see that, indeed, the row labels are not correctly set.
- Specify the row labels by setting cars.index equal to row_labels.
- Print out cars again and check if the row labels are correct this time.

In [6]:
import pandas as pd

# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(dict)
print(cars)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JAP', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JAP          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


<hr>

## Index and Select Data 

There are numerous way in which you can index and select data from DataFrames. 

First we will discuss using Square Brackets, followed by advanced data access methods, loc and iloc, that make Pandas extra powerful. 


Suppose that you only want to select the MLY-PRCP-NORMAL column from the weather DataFrame. 

You simply use square brackets passing in the column label. Python will print out the entire column, together with the row labels. 

In [7]:
weather = pd.read_csv(path, index_col=0)
print(weather)
print()
print(weather["MLY-PRCP-NORMAL"])

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1
Feb               3.31             2.09             4.44             43.4
Mar               3.51             2.60             3.93             46.6
Apr               2.77             2.10             2.94             50.5
May               2.16             1.34             2.91             56.0
Jun               1.63             1.04             2.25             61.0
Jul               0.79             0.30             1.10             65.9
Aug               0.97             0.33             1.24             66.5
Sep               1.52             0.64             2.06             61.6
Oct               3.41             2.03             4.41             53.3
Nov               5.84             4.53             7.17             46.2
Dec               5.43             3.9

Theres something interesting here, the last line says ```Name: MLY-PRCP-NORMAL, dtype: float64```. We're clearly not dealing with a regular DataFrame here. 

Let's find out about the type of the object that gets returned, with the ```type()``` python built in function. 

In [8]:
print(type(weather))
print(type(weather["MLY-PRCP-NORMAL"]))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>


## Pandas Series

We can see that the data type of (weather\["MLY-PRCP-NORMAL"\]) (a DataFrame column is a **Pandas Series**. 

<span style="background-color:yellow">You can think of the Series as a one dimensional array that can be labeled, just like the DataFrame. 
Otherwise put, if you paste a bunch of Series, you can create a DataFrame
</span>

### Selecting a column
If you want to select the MLY-PRCP-NORMAL column but **keep the data** in the data frame, you'll need double square brackets. Checking the type we see it's a DataFrame with only one column.

In [9]:
print('weather["MLY-PRCP-NORMAL"]')
print('type(weather["MLY-PRCP-NORMAL"]): ', type(weather["MLY-PRCP-NORMAL"]))
print(weather["MLY-PRCP-NORMAL"])

print()

print('weather[["MLY-PRCP-NORMAL"]]')
print('type(weather[["MLY-PRCP-NORMAL"]]):', type(weather[["MLY-PRCP-NORMAL"]]))
print(weather[["MLY-PRCP-NORMAL"]])

weather["MLY-PRCP-NORMAL"]
type(weather["MLY-PRCP-NORMAL"]):  <class 'pandas.core.series.Series'>
MONTH
Jan    4.81
Feb    3.31
Mar    3.51
Apr    2.77
May    2.16
Jun    1.63
Jul    0.79
Aug    0.97
Sep    1.52
Oct    3.41
Nov    5.84
Dec    5.43
Name: MLY-PRCP-NORMAL, dtype: float64

weather[["MLY-PRCP-NORMAL"]]
type(weather[["MLY-PRCP-NORMAL"]]): <class 'pandas.core.frame.DataFrame'>
       MLY-PRCP-NORMAL
MONTH                 
Jan               4.81
Feb               3.31
Mar               3.51
Apr               2.77
May               2.16
Jun               1.63
Jul               0.79
Aug               0.97
Sep               1.52
Oct               3.41
Nov               5.84
Dec               5.43


### Selecting Multiple Columns 

You can extend this call to select two columns. 

If you look at it from a different angle, you're actually putting a list with column labels inside another set of square brackets, and end up with a **'sub data frame'**, containing only selected columns.

In [10]:
weather[["MLY-PRCP-NORMAL","MLY-PRCP-25PCTL"]]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1
Jan,4.81,3.29
Feb,3.31,2.09
Mar,3.51,2.6
Apr,2.77,2.1
May,2.16,1.34
Jun,1.63,1.04
Jul,0.79,0.3
Aug,0.97,0.33
Sep,1.52,0.64
Oct,3.41,2.03


## Selecting Rows of DataFrames 
Although it's uncommon and more of a convenience feature, you can use the same square brackets to select rows from a DataFrame, The only way to do this is by specifying a slice. 

**Note:** The end of the slice is exclusive and the index starts at zero. 

In [11]:
weather[1:4]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL,MLY-PRCP-75PCTL,MLY-TAVG-NORMAL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Feb,3.31,2.09,4.44,43.4
Mar,3.51,2.6,3.93,46.6
Apr,2.77,2.1,2.94,50.5


## Selecting Specific Columns And Rows Of A DataFrame

In [12]:
weather[["MLY-PRCP-NORMAL","MLY-PRCP-25PCTL"]][0:4]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1
Jan,4.81,3.29
Feb,3.31,2.09
Mar,3.51,2.6
Apr,2.77,2.1


Square brackets only offers limited functionality. 

Similarly to Numpy 2D arrays: 
When using square brackets you index or slice before the comma to reference the rows, and the index or slice after the comma references the columns.
```my_array[rows, columns]```

## ```loc``` Function 
Pandas can do the same thing but the syntax is a little different.
To do this in pandas we have to use ```loc``` and ```iloc``` functions. 

- ```loc``` - is a technique to select parts of your data based on labels,  which means that you have to specify rows and columns based on their row and column labels. 
- ```iloc``` - is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise

Lets try get the row for the Month 'Feb' 


In [13]:
weather.loc['Feb']

MLY-PRCP-NORMAL     3.31
MLY-PRCP-25PCTL     2.09
MLY-PRCP-75PCTL     4.44
MLY-TAVG-NORMAL    43.40
Name: Feb, dtype: float64

Note that ```.loc``` returns a Series, containing all the row's information, rather than inconveniently shown on different lines. 

To get a data frame, we have to put the 'Feb' string (the index value | row label) inside double pair of brackets.

In [14]:
weather.loc[['Feb']]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL,MLY-PRCP-75PCTL,MLY-TAVG-NORMAL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Feb,3.31,2.09,4.44,43.4


### Using ```loc``` with Multiple Rows 
Pass in multiple row labels <span style="color:red">inside double square brackets </span> separated by commas.

This will select the entire rows, meaning all columns will be included. 

You can use single square brackets but the difference is that you can extend your selection with a comma and a specification of the columns of interest. 

In [21]:
# weather.loc['Jan','Feb','Mar'] # Will not work, to many indexers for single brackets
weather.loc[['Jan','Feb','Mar']]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL,MLY-PRCP-75PCTL,MLY-TAVG-NORMAL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jan,4.81,3.29,6.07,42.1
Feb,3.31,2.09,4.44,43.4
Mar,3.51,2.6,3.93,46.6


### Row and Column ```loc``` 

Lets select the same rows but now only select specific columns, the intersection will be returned.

Syntax: ```df[[rows index separated with commas] , [columns index separated with commas]]``` 

In [23]:
weather.loc[['Jan','Feb','Mar'],['MLY-PRCP-NORMAL', 'MLY-PRCP-25PCTL']]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1
Jan,4.81,3.29
Feb,3.31,2.09
Mar,3.51,2.6


### Selecting all rows but a specific ammount of columns

Replace the first list that specifies the row labels with a colon, which specifies a slice going from beginning to end. 

In [24]:
weather.loc[ : ,['MLY-PRCP-NORMAL', 'MLY-PRCP-25PCTL']]

Unnamed: 0_level_0,MLY-PRCP-NORMAL,MLY-PRCP-25PCTL
MONTH,Unnamed: 1_level_1,Unnamed: 2_level_1
Jan,4.81,3.29
Feb,3.31,2.09
Mar,3.51,2.6
Apr,2.77,2.1
May,2.16,1.34
Jun,1.63,1.04
Jul,0.79,0.3
Aug,0.97,0.33
Sep,1.52,0.64
Oct,3.41,2.03


## Recap
- Square brackets:
    - Column access: ```df[['col1','col2']]```  Note: double square brackets because two series will result as a dataframe
    - Row access: ```df[start_row_num: end_row_num_not_included]```

- loc (label-based):
    - Column access: ```df.loc[ : , ['col1','col2'] ]```
    - Row access: ```df.loc[['row_0_label','row_10_label']]```
    - Column and Row Access: ```df.loc[['row_0_label','row_10_label'] , ['col1','col2'] ]```
    

When using ```loc```, subsetting becomes similar to how you subsetted 2D Numpy Arrays. The only difference is that you have to use row labels with ```loc```, not the positions of the elements. 
<hr>


## ```iloc``` Function 

If you want to subSet pandas DataFrames based on their position, or index, you'll need to use the ```iloc``` function.

### Row Access Using ```iloc``` 
Let's get the 'Jan' row, using ```iloc```. Opposed to passing in the row label index we will use a numerical index. 

In [27]:
print(weather.loc[['Jan']])
print()
print(weather.iloc[[0]])

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1


### Multiple Row Access Using ```iloc``` 

In [30]:
print(weather.loc[['Jan','Feb','Mar']])
print()
# print(weather.iloc[[0:3]]) # Cannot slice
print(weather.iloc[[0,1,2]])

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1
Feb               3.31             2.09             4.44             43.4
Mar               3.51             2.60             3.93             46.6

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL  MLY-PRCP-75PCTL  MLY-TAVG-NORMAL
MONTH                                                                    
Jan               4.81             3.29             6.07             42.1
Feb               3.31             2.09             4.44             43.4
Mar               3.51             2.60             3.93             46.6


## Row and Column ```iloc```

In [35]:
print(weather.loc[['Jan','Feb','Mar'],['MLY-PRCP-NORMAL', 'MLY-PRCP-25PCTL']])
print()
print(weather.iloc[[0,1,2], [0, 1]])
print(weather.iloc[: , [0, 1]])

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL
MONTH                                  
Jan               4.81             3.29
Feb               3.31             2.09
Mar               3.51             2.60

       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL
MONTH                                  
Jan               4.81             3.29
Feb               3.31             2.09
Mar               3.51             2.60
       MLY-PRCP-NORMAL  MLY-PRCP-25PCTL
MONTH                                  
Jan               4.81             3.29
Feb               3.31             2.09
Mar               3.51             2.60
Apr               2.77             2.10
May               2.16             1.34
Jun               1.63             1.04
Jul               0.79             0.30
Aug               0.97             0.33
Sep               1.52             0.64
Oct               3.41             2.03
Nov               5.84             4.53
Dec               5.43             3.94


<hr>

## Square Brackets (1)
In the sample code on the right, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

```
cars['cars_per_cap']
cars[['cars_per_cap']]
```

<span style="background-color:yellow">The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.</span>

## Instructions
- Use single square brackets to print out the country column of cars as a Pandas Series.
- Use double square brackets to print out the country column of cars as a Pandas DataFrame.
- Use double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.

In [52]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
print(cars)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

# Print out DataFrame with country and drives_right columns
print(cars[['country','drives_right']])

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
IN             18          India         False
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True
US     United States
AUS        Australia
JAP            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country
US   United States
AUS      Australia
JAP          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
           country  drives_right
US   United States          True
AUS      Australia         False
JAP          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True


<hr>

## Square Brackets (2)
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:

```cars[0:5]```
The result is another DataFrame containing only the rows you specified.

<span style="background-color:yellow">Pay attention: You can only select rows using square brackets if you specify a slice, like ```0:4```. Also, you're using the integer indexes of the rows here, not the row labels!</span>

## Instructions
- Select the first 3 observations from cars and print them out.
- Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.

In [41]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out first 3 observations
print(cars[0:3])

# Print out fourth, fifth and sixth observation
print(cars[3:6])

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False
     cars_per_cap  country  drives_right
IN             18    India         False
RU            200   Russia          True
MOR            70  Morocco          True
     cars_per_cap  country  drives_right
IN             18    India         False
RU            200   Russia          True
MOR            70  Morocco          True


<hr>

## ```loc``` and ```iloc``` (1) Selecting Rows
With loc and iloc you can do practically any data selection operation on DataFrames you can think of. 

- ```loc``` is label-based, which means that you have to specify rows and columns based on their row and column labels. 

- ```iloc``` is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

Try out the following commands in the IPython Shell to experiment with loc and iloc to select observations. Each pair of commands here gives the same result.

```
cars.loc['RU']
cars.iloc[4]

cars.loc[['RU']]
cars.iloc[[4]]

cars.loc[['RU', 'AUS']]
cars.iloc[[4, 1]]
```

As before, code is included that imports the cars data as a Pandas DataFrame.

## Instructions
- Use loc or iloc to select the observation corresponding to Japan as a Series. The label of this row is JAP, the index is 2. Make sure to print the resulting Series.
- Use loc or iloc to select the observations for Australia and Egypt as a DataFrame. You can find out about the labels/indexes of these rows by inspecting cars in the IPython Shell. Make sure to print the resulting DataFrame.

In [51]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out observation for Japan
print(cars.loc['JAP']) # print a series
print(cars.loc['JAP']['cars_per_cap']) # print a value in a series

print(cars.iloc[2]) # print a series
print(cars.iloc[2][0]) # print a value in a series


# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']]) # if you are selecting more than one row or column then you must utilize double brackets to signify you want a DataFrame, since a series is a one-dimensional array and you want more than one-dimension
print(cars.iloc[[1, 6]]) # if you are selecting more than one row or column then you must utilize double brackets to signify you want a DataFrame, since a series is a one-dimensional array and you want more than one-dimension
# print(cars.iloc[[1: 6]]) # Note: slicing doesn't work with iloc

cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
588
cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
588
     cars_per_cap    country  drives_right
AUS           731  Australia         False
EG             45      Egypt          True
     cars_per_cap    country  drives_right
AUS           731  Australia         False
EG             45      Egypt          True


<hr>

## ```loc``` and ```iloc``` (2) Selecting Rows and Columns
loc and iloc also allow you to select both rows and columns from a DataFrame. To experiment, try out the following commands in the IPython Shell. Again, paired commands produce the same result.

```
cars.loc['IN', 'cars_per_cap']
cars.iloc[3, 0]

cars.loc[['IN', 'RU'], 'cars_per_cap']
cars.iloc[[3, 4], 0]

cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']]
cars.iloc[[3, 4], [0, 1]]
```

## Instructions
- Print out the drives_right value of the row corresponding to Morocco (its row label is MOR)
- Print out a sub-DataFrame, containing the observations for Russia and Morocco and the columns country and drives_right.

In [53]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drives_right value of Morocco
print(cars.loc['MOR','drives_right'])
print(cars.iloc[5,2])

# Print sub-DataFrame
print(cars.loc[['RU','MOR'],['country','drives_right']])
print(cars.iloc[[4,5],[1,2]])

True
True
     country  drives_right
RU    Russia          True
MOR  Morocco          True
     country  drives_right
RU    Russia          True
MOR  Morocco          True


<hr>

## ```loc``` and ```iloc``` (3) Selecting Only Columns
It's also possible to select only columns with loc and iloc. In both cases, you simply put a slice going from beginning to end in front of the comma:

```
cars.loc[:, 'country']
cars.iloc[:, 1]

cars.loc[:, ['country','drives_right']]
cars.iloc[:, [1, 2]]
```

## Instructions
- Print out the drives_right column as a Series using loc or iloc.
- Print out the drives_right column as a DataFrame using loc or iloc.
- Print out both the cars_per_cap and drives_right column as a DataFrame using loc or iloc.

In [56]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out drives_right column as Series
print(cars.loc[:, 'drives_right'])
print(cars.iloc[:,2])

# Print out drives_right column as DataFrame
print(cars.loc[:, ['drives_right']])
print(cars.iloc[:,[2]])

# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:,['cars_per_cap','drives_right']])
print(cars.iloc[:,[0,2]])

US      True
AUS    False
JAP    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
US      True
AUS    False
JAP    False
IN     False
RU      True
MOR     True
EG      True
Name: drives_right, dtype: bool
     drives_right
US           True
AUS         False
JAP         False
IN          False
RU           True
MOR          True
EG           True
     drives_right
US           True
AUS         False
JAP         False
IN          False
RU           True
MOR          True
EG           True
     cars_per_cap  drives_right
US            809          True
AUS           731         False
JAP           588         False
IN             18         False
RU            200          True
MOR            70          True
EG             45          True
     cars_per_cap  drives_right
US            809          True
AUS           731         False
JAP           588         False
IN             18         False
RU            200          True
MOR            70 