📝 **Author:** Amirhossein Heydari - 📧 **Email:** <amirhosseinheydari78@gmail.com> - 📍 **Origin:** [mr-pylin/numpy-workshop](https://github.com/mr-pylin/numpy-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Looking ahead](#toc2_)    
  - [Pandas](#toc2_1_)    
    - [Create a data-frame](#toc2_1_1_)    
    - [Reading a csv file](#toc2_1_2_)    
    - [DataFrame Operations](#toc2_1_3_)    
  - [Matplotlib](#toc2_2_)    
    - [Basic Line Plot](#toc2_2_1_)    
    - [Bar Chart](#toc2_2_2_)    
    - [Scatter Plot](#toc2_2_3_)    
    - [Plotting Data from a DataFrame](#toc2_2_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [1]:
import numpy as np

# <a id='toc2_'></a>[Looking ahead](#toc0_)


## <a id='toc2_1_'></a>[Pandas](#toc0_)

**Key Features:**

   1. **DataFrame and Series Structures**
      - `DataFrame`: Two-dimensional labeled data structure with columns of potentially different types.
      - `Series`: One-dimensional labeled array capable of holding any data type.

   2. **Data Manipulation**
      - Easy handling of missing data.
      - Label-based slicing, indexing, and subsetting of large datasets.

   3. **Data Cleaning**
      - Handling missing values with functions like `dropna`, `fillna`
      - Duplicate removal with `drop_duplicates`
      - Data type conversions with `astype`

   4. **Data Aggregation and Grouping**
      - Grouping data using groupby for split-apply-combine operations.
      - Aggregation methods like sum, mean, count, etc.

   5. **Merging and Joining**
      - Combine datasets using `merge`, `join`, `concat`

   6. **Input and Output**
      - Reading from and writing to various file formats: CSV, Excel, SQL, JSON, etc.

📝 Doc:

- pandas documentation: [pandas.pydata.org/docs](https://pandas.pydata.org/docs/)


In [2]:
import pandas as pd

### <a id='toc2_1_1_'></a>[Create a data-frame](#toc0_)


In [3]:
data_1 = {"Name": ["Alice", "Bob", "Charlie", "David"], "Age": [25, 30, 35, 40], "City": ["New York", "Los Angeles", "Chicago", "Houston"]}

# data-frame
df_1 = pd.DataFrame(data_1)

# log
print(df_1)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston


### <a id='toc2_1_2_'></a>[Reading a csv file](#toc0_)


In [4]:
df_2 = pd.read_csv("../assets/txtfiles/file_1.csv")

# log
print(df_2.head())

    A   B   C
0   1   2   3
1   4   5   6
2   7   8   9
3  10  11  12


### <a id='toc2_1_3_'></a>[DataFrame Operations](#toc0_)


In [5]:
data_2 = {"Name": ["Alice", "Bob", "Charlie", "David"], "Age": [25, 30, 35, 40], "City": ["New York", "Los Angeles", "New York", "Houston"]}

# data-frame
df_3 = pd.DataFrame(data_2)

In [6]:
# filtering data
filter_1 = df_3[df_3["Age"] >= 30]

# log
print(filter_1)

      Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   35     New York
3    David   40      Houston


In [7]:
# grouping data
group_1 = df_3.groupby("City")["Age"].mean()

# log
print(group_1)

City
Houston        40.0
Los Angeles    30.0
New York       30.0
Name: Age, dtype: float64


In [8]:
# adding a new column
df_3["Age in 5 Years"] = df_3["Age"] + 5

# log
print(df_3)

      Name  Age         City  Age in 5 Years
0    Alice   25     New York              30
1      Bob   30  Los Angeles              35
2  Charlie   35     New York              40
3    David   40      Houston              45


## <a id='toc2_2_'></a>[Matplotlib](#toc0_)

**Key Features:**

1. **Various Plot Types:**
   - Line plots, scatter plots, bar charts, histograms, pie charts, etc.
   - Customizable plot types and easy creation of complex visualizations.

2. **Customization:**
   - Extensive customization options for plots (colors, labels, legends, etc.).
   - Control over every aspect of a figure, including lines, markers, fonts, and more.

3. **Interactive Plots:**
   - Interactive figures with zooming, panning, and updating capabilities.
   - Integration with Jupyter Notebooks for interactive data exploration.

4. **Subplots:**
   - Create multiple plots in a single figure using `subplot` and `subplots`.
   - Grid layout management for complex visualizations.

5. **Integration with NumPy and Pandas:**
   - Direct plotting from NumPy arrays and pandas DataFrames.
   - Seamless integration with other scientific libraries like SciPy.

6. **Saving Figures:**
   - Export plots to various file formats: PNG, PDF, SVG, etc.
   - Customization of figure size, resolution, and quality for publication-ready outputs.

7. **3D Plotting:**
   - Support for 3D plotting using `mpl_toolkits.mplot3d`.
   - Creation of 3D line plots, surface plots, and scatter plots.

**Doc:**

- Matplotlib documentation: [matplotlib.org/stable/index.html](https://matplotlib.org/stable/index.html)


In [9]:
import matplotlib.pyplot as plt

### <a id='toc2_2_1_'></a>[Basic Line Plot](#toc0_)


In [None]:
# sample data
x = [0, 1, 2, 3, 4]
y = [0, 1, 4, 9, 16]

# creating a line plot
plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Basic Line Plot")
plt.show()

### <a id='toc2_2_2_'></a>[Bar Chart](#toc0_)


In [None]:
# sample data
categories = ["A", "B", "C", "D"]
values = [3, 7, 5, 4]

# creating a bar chart
plt.bar(categories, values)
plt.xlabel("Categories")
plt.ylabel("Values")
plt.title("Bar Chart")
plt.show()

### <a id='toc2_2_3_'></a>[Scatter Plot](#toc0_)


In [None]:
# sample data
x = np.random.rand(50)
y = np.random.rand(50)

# creating a scatter plot
plt.scatter(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Scatter Plot")
plt.show()

### <a id='toc2_2_4_'></a>[Plotting Data from a DataFrame](#toc0_)


In [None]:
# creating a DataFrame
data_3 = {"Year": [2010, 2011, 2012, 2013, 2014], "Sales": [100, 120, 150, 180, 200]}
df_4 = pd.DataFrame(data_3)

# plotting data from the DataFrame
plt.plot(df_4["Year"], df_4["Sales"])
plt.xlabel("Year")
plt.ylabel("Sales")
plt.title("Yearly Sales")
plt.show()