<a href="https://colab.research.google.com/github/mohamedyosef101/Python-for-AI/blob/main/03_working_with_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with data
In this notebook, we'll talk about three Python libraries: `pandas` for dealing with data like SQL and Excel, `numpy` for doing mathematical operations on matrix, and `matplotlib` for visualizing the data.

## Pandas
Pandas is a powerful Python library that provides data structures and functions to efficiently manipulate large datasets.

<br>

> There are two main data structures in pandas are Series and DataFrame.

<br>



### Series
A series is a one-dimensional object that can hold data of any type (integers, strings, floats, etc.). It is similar to a column in a spreadsheet or a list in Python.

In [4]:
import pandas as pd

# Creating a Series
my_series = pd.Series([21, 23, 31, 24, 32])
print(my_series)

0    21
1    23
2    31
3    24
4    32
dtype: int64


### DataFrame
A DataFrame is a two-dimensional table-like data structure with labeled axes (rows and columns). It is similar to a table in a database or an Excel spreadsheet.

In [8]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Mohamed', 'Yosef', 'Faiz'],
    'Age': [25, 54, 21],
    'City': ['Paris', 'Mansoura', 'Roma']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Mohamed,25,Paris
1,Yosef,54,Mansoura
2,Faiz,21,Roma


In [21]:
# Viewing the first row
"""
the default is viewing the first 5 rows
"""

df.head(1)

Unnamed: 0,Name,Age,City
0,Mohamed,25,Paris


In [14]:
# Viewing the last row
df.tail(1)

Unnamed: 0,Name,Age,City
2,Faiz,21,Roma


In [18]:
# Accessing a single column
print(df['Name'])

0    Mohamed
1      Yosef
2       Faiz
Name: Name, dtype: object


In [20]:
# Accessing multiple columns
df[['Name', 'City']]

Unnamed: 0,Name,City
0,Mohamed,Paris
1,Yosef,Mansoura
2,Faiz,Roma


In [27]:
# Accessing rows by index
print(f"Using label-based indexing: \n\n{df.loc[0:1]}")
print("\n\n===============\n\n")
print(f"Using integer-based indexing: \n\n{df.iloc[0:2]}")
# this second one is like the indexing we used in lists

Using label-based indexing: 

      Name  Age      City
0  Mohamed   25     Paris
1    Yosef   54  Mansoura




Using integer-based indexing: 

      Name  Age      City
0  Mohamed   25     Paris
1    Yosef   54  Mansoura


In [29]:
# Adding new column
df['Country'] = ['France', 'Egypt', 'Italy']
df

Unnamed: 0,Name,Age,City,Country
0,Mohamed,25,Paris,France
1,Yosef,54,Mansoura,Egypt
2,Faiz,21,Roma,Italy


In [31]:
# Updating a specific value

"""
Let's say that we want to change Mohamed's age to be 22.
The command: at index 0 update column 'Age' with value 22.
"""

df.at[0, 'Age'] = 22
df.iloc[0]

Name       Mohamed
Age             22
City         Paris
Country     France
Name: 0, dtype: object

In [33]:
# Updating the entire column

"""
For example a year passed and you want to update everyone's age
"""

df['Age'] = df['Age'] + 1
print(df)

      Name  Age      City Country
0  Mohamed   23     Paris  France
1    Yosef   55  Mansoura   Egypt
2     Faiz   22      Roma   Italy


In [34]:
# Removing a column
df = df.drop('Country', axis=1)
df

Unnamed: 0,Name,Age,City
0,Mohamed,23,Paris
1,Yosef,55,Mansoura
2,Faiz,22,Roma


In [35]:
# Removing a row
df = df.drop(0, axis=0)
df

Unnamed: 0,Name,Age,City
1,Yosef,55,Mansoura
2,Faiz,22,Roma


Later, we will take about handling missing values with pandas...