<a href="https://colab.research.google.com/github/sssoja/python-intro/blob/main/Session7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 6: Introduction to Pandas
## Rodrigo Careaga
### 22/May/2024


### What is Pandas?

Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

### Key Features of Pandas
- Fast and efficient DataFrame object for data manipulation with integrated indexing.
- Tools for reading and writing data between in-memory data structures and different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of data sets.
- Label-based slicing, indexing, and subsetting of large data sets.
- Data structure column insertion and deletion.
- Group by engine allowing split-apply-combine operations on data sets.
- High performance merging and joining of data.
- Time Series functionality.

## Importing Pandas
To use pandas, you first need to import the pandas package. We usually import it as `pd`:

In [None]:
import pandas as pd

## Series and DataFrame

### Series
A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels, called its index.

In [None]:
# Creating a simple Pandas Series from a list
data = pd.Series([1, 3, 5, 7, 9])
print(data)

### DataFrame

A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In [None]:
# Creating a DataFrame from a dictionary
data = {
    'Country': ['Belgium', 'India', 'Brazil'],
    'Capital': ['Brussels', 'New Delhi', 'Brasília'],
    'Population': [12431256, 123154262, 21451325]
}
df = pd.DataFrame(data)
print(df)

   Country    Capital  Population
0  Belgium   Brussels    12431256
1    India  New Delhi   123154262
2   Brazil   Brasília    21451325


### Reading Data

Pandas supports many file formats like CSV, Excel, JSON, HTML, and HDF5, among others. Here, we'll focus on reading data from a CSV file.

In [None]:
# Loading data from CSV
df = pd.read_csv('sample.csv')
print(df.head())  # Display the first 5 rows of the DataFrame

       Item  Amount  Cost
0    Laptop      25   500
1     Mouse      33    99
2  Keyboard      10   250


### Data Manipulation

#### Selecting Data

Accessing a specific column

In [None]:
print(df['Capital'])

0     Brussels
1    New Delhi
2     Brasília
Name: Capital, dtype: object


Slicing rows

In [None]:
print(df[0:2])

   Country    Capital  Population
0  Belgium   Brussels    11190846
1    India  New Delhi  1303171035


Data Filtering

In [None]:
# Filtering data based on a condition
filtered_data = df[df['Population'] > 1000000]
print(filtered_data)

   Country    Capital  Population
0  Belgium   Brussels    11190846
1    India  New Delhi  1303171035
2   Brazil   Brasília   207847528


#### Data Operations

Applying functions to data

In [None]:
df['Population in Millions'] = df['Population'].apply(lambda x: x / 1000000)
print(df)

   Country    Capital  Population  Population in Millions
0  Belgium   Brussels    11190846               11.190846
1    India  New Delhi  1303171035             1303.171035
2   Brazil   Brasília   207847528              207.847528


#### Basic Data Analysis

Descriptive statistics

In [None]:
print(df.describe())

         Population  Population in Millions
count  3.000000e+00                3.000000
mean   5.074031e+08              507.403136
std    6.961346e+08              696.134595
min    1.119085e+07               11.190846
25%    1.095192e+08              109.519187
50%    2.078475e+08              207.847528
75%    7.555093e+08              755.509282
max    1.303171e+09             1303.171035


Finding unique values

In [None]:
print(df['Country'].unique())

['Belgium' 'India' 'Brazil']


Counting values

In [None]:
print(df['Country'].value_counts())

Country
Belgium    1
India      1
Brazil     1
Name: count, dtype: int64
