# The Pandas Library: Basic Functionality

Pandas is a powerful Python library for data analysis and manipulation. It provides data structures and functions needed to work with structured data seamlessly.

This notebook will cover the basic functionalities and essential functions provided by pandas, making it easier to analyze data in Python.

In [None]:
# Importing the pandas library
import os
import pandas as pd

# Show the version of pandas
print(f'The installes version of pandas is: {pd.__version__}')

# Show the current working directory
print(f'\nThe current working directory is: {os.getcwd()}')



## Creating DataFrames and Series

Pandas has two primary data structures:

1. **Series** - 1-dimensional labeled array
2. **DataFrame** - 2-dimensional labeled data structure

Let's see how to create both.


In [None]:
# Creating a Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data, name="Numbers")
series


In [None]:
# Creating a Python dictionary with data
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [24, 27, 22, 32],
    "City": ["New York", "San Francisco", "Los Angeles", "Chicago"]
}

# Converting the dictionary to a DataFrame
df = pd.DataFrame(data)
df


## Inspecting Data

Once you have a DataFrame, you might want to inspect it. Some key methods include:
- `head()`: View the first few rows of the DataFrame.
- `tail()`: View the last few rows of the DataFrame.
- `info()`: Get a summary of the DataFrame.
- `describe()`: Get statistical summary of numerical columns.


In [None]:
# Viewing first few rows
df.head()


In [None]:
# Viewing last few rows
df.tail()


In [None]:
# Summary information of the DataFrame
df.info()


In [None]:
# Statistical summary of numerical columns
df.describe()



## Data Selection

You can select data in a DataFrame using column names, or by using slicing.

- Selecting a single column returns a Series.
- Selecting multiple columns returns a DataFrame.


In [None]:
# Selecting a single column
df['Name']


In [None]:
# Selecting multiple columns
df[['Name', 'Age']]



### Selecting Rows

Pandas provides two main ways to select rows:
- **loc**: Select by label.
- **iloc**: Select by index position.


In [None]:
# Selecting rows by label (loc)
df.loc[df['Name'] == 'Alice']


In [None]:
# Selecting rows by index position (iloc)
df.iloc[2]  # Third row



## Filtering Data

Filtering data in a DataFrame is done using boolean conditions.


In [None]:
# Filter rows where Age > 25
df[df['Age'] > 25]



## Adding/Modifying Columns

New columns can be added to a DataFrame, or existing columns can be modified.


In [None]:
# Adding a new column 'Salary'
df['Salary'] = [70000, 80000, 50000, 120000]
df


In [None]:
# Modifying an existing column
df['Age'] = df['Age'] + 1  # Incrementing each age by 1
df



## Dropping Columns or Rows

You can remove columns or rows using the `drop` function.


In [None]:
# Dropping the 'Salary' column
df.drop('Salary', axis=1, inplace=True)
df


In [None]:
# Dropping a row by index
df.drop(0, axis=0, inplace=True)  # Dropping the first row
df



## Data Aggregation

Pandas provides many functions to perform data aggregation, like `sum`, `mean`, `min`, `max`, etc.


In [None]:
# Calculating the mean age
df['Age'].mean()



## Grouping Data

Using `groupby`, you can group data by a specific column and perform aggregate functions.


In [None]:
# Grouping by City and calculating mean age for each city
df.groupby('City')['Age'].mean()



## Handling Missing Data

Pandas provides methods to handle missing data, like `fillna` and `dropna`.


In [None]:
# Adding missing values to demonstrate handling
df.loc[2, 'Age'] = None  # Setting age in the third row to NaN
df


In [None]:
# Handling missing values by filling NaN with 0
df['Age'] = df['Age'].fillna(0)
df

In [None]:
# Dropping rows with missing values
df.dropna(inplace=True)
df



## Saving and Loading Data

Pandas can easily save DataFrames to files and read from files. Common formats are CSV, Excel, and JSON.


In [None]:
# Saving to a CSV file
df.to_csv("sample_data.csv", index=False)


In [None]:

# Loading from a CSV file
loaded_df = pd.read_csv("sample_data.csv")
loaded_df



## Conclusion

This notebook covered the basics of pandas, including data creation, inspection, selection, filtering, modification, aggregation, handling missing data, and saving/loading data.

### Jupyter notebook --footer info-- (please always provide this at the end of each submitted notebook)

In [None]:
import os
import platform
import socket
from platform import python_version
from datetime import datetime

print('-----------------------------------')
print(os.name.upper())
print(platform.system(), '|', platform.release())
print('Datetime:', datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
print('Python Version:', python_version())
print('-----------------------------------')