# Pandas <br>

Pandas is a powerful and popular Python library used for data manipulation and analysis. It provides flexible data structures, primarily:

•	Series (1D labeled arrays) <br>
•	DataFrames (2D tables similar to Excel or SQL tables)<br>

Key Features of Pandas:<br>

✔ Easy Data Handling – Load, clean, and manipulate datasets efficiently.<br>
✔ Data Analysis & Aggregation – Perform filtering, grouping, and statistical operations.<br>
✔ Data Visualization – Works well with Matplotlib and Seaborn for charts and graphs.<br>
✔ File Support – Read and write CSV, Excel, SQL, JSON, and more.<br>
✔ Indexing & Slicing – Quick access to specific rows and columns.<br>


As a data scientist, you'll often be working with tons of data. The form of this data can vary greatly, but pretty often, you can boil it down to a tabular structure, that is, in the form of a table like in a spreadsheet. Let's have a look at some examples.<br>

## Exercise

Dictionary to DataFrame (1) <br>
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising! <br>

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary. <br>

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on. <br>

Three lists are defined in the script: <br>

`names`, containing the country names for which data is available.<br>
`dr`, a list with booleans that tells whether people drive left or right in the corresponding country.<br>
`cpc`, the number of motor vehicles per 1000 people in the corresponding country.<br>
Each dictionary key is a column label and each value is a list which contains the column elements.<br>

**Instructions**

Import pandas as pd.<br>
Use the pre-defined lists to create a dictionary called my_dict. There should be three key value pairs:<br>
key `'country'` and value `names`.<br>
key `'drives_right'` and value `dr`.<br>
key `'cars_per_cap'` and value `cpc`.<br>
Use `pd.DataFrame()` to turn your dict into a DataFrame called `cars`.<br>
Print out `cars` and see how beautiful it is.<br>

In [5]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Import pandas as pd

import pandas as pd

# Create dictionary my_dict with three key:value pairs: my_dict

my_dict = {'country': names, 'drives_right': dr, 'cars_per_cap': cpc} 


# Build a DataFrame cars from my_dict: cars

cars = pd.DataFrame(my_dict)

# Print cars

print(cars)


         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45


**Dictionary to DataFrame (2)**
The Python code that solves the previous exercise is included in the script. Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

To solve this a list `row_labels` has been created. You can use it to specify the row labels of the cars DataFrame. You do this by setting the `index` attribute of `cars`, that you can access as `cars.index`.

**Instructions**

Hit Run Code to see that, indeed, the row labels are not correctly set.<br>
Specify the row labels by setting `cars.index` equal to `row_labels`.<br>
Print out `cars` again and check if the row labels are correct this time.<br>

In [10]:
import pandas as pd

# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict)
print(cars)

# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars

cars.index = row_labels
# Print cars again
print(cars)

         country  drives_right  cars_per_cap
0  United States          True           809
1      Australia         False           731
2          Japan         False           588
3          India         False            18
4         Russia          True           200
5        Morocco          True            70
6          Egypt          True            45
           country  drives_right  cars_per_cap
US   United States          True           809
AUS      Australia         False           731
JPN          Japan         False           588
IN           India         False            18
RU          Russia          True           200
MOR        Morocco          True            70
EG           Egypt          True            45


**8Square Brackets (1)**
In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.<br>

In the sample code, the same cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:<br>

```
cars['cars_per_cap']
cars[['cars_per_cap']]
```
The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.<br>

**Instructions**

Use single square brackets to print out the `country` column of `cars` as a Pandas Series.<br>
Use double square brackets to print out the `country` column of `cars` as a Pandas DataFrame.<br>
Use double square brackets to print out a DataFrame with both the `country` and drives_right columns of cars, in this order.

In [11]:
# Import cars data
import pandas as pd
#cars = pd.read_csv('cars.csv', index_col=0)

# Print out country column as Pandas Series (1D)
#print(cars['country'])  

# Print out country column as Pandas DataFrame (2D)
#print(cars[['country']])  

# Print out DataFrame with both 'country' and 'drives_right' columns
#print(cars[['country', 'drives_right']])

**Square Brackets (2)**
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame. The following call selects the first five rows from the cars DataFrame:
```
cars[0:5]
```
The result is another DataFrame containing only the rows you specified.<br>

Pay attention: You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels!<br>

**Instructions**

Select the first 3 observations from cars and print them out.<br>
Select the fourth, fifth and sixth observation, corresponding to row indexes 3, 4 and 5, and print them out.<br>