# Lesson: Introduction to basic libraries in Python

## Objective:
By the end of this lesson, students will be able to:


Understand the basics of NumPy, Matplotlib, Seaborn, and Pandas.\
Learn how to manipulate data, visualize it, and perform basic data analysis tasks.

## NumPy 
(or Numerical Python) is the fundamental package for scientific computing with Python. It provides support for arrays, matrices, and mathematical functions.


In [None]:
import numpy as np
#imports the numpy (num-pie) library and renames it np for short 
print('imported numpy as np')


In [None]:
# Creating a NumPy array
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", arr)



In [None]:
# Basic operations
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Max:", np.max(arr))



In [None]:
# Generating arrays
zeros_arr = np.zeros((2, 3))  # 2x3 array of zeros
ones_arr = np.ones((3, 2))    # 3x2 array of ones
print("zeros_arr:")
print(zeros_arr)
print("ones_arr:")
print(ones_arr)


In [None]:
# Array operations
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
print("Array Addition:")
# The first two elements are added together 1+5 = 6 the second two 2+8 = 8, etc
print(arr1 + arr2)

# Matplotlib
Matplotlib is a plotting library for creating static, interactive, and animated visualizations in Python.

In [None]:
import matplotlib.pyplot as plt
print('matplotlib.pyplot as plt')



Use NumPy to create the array and get the sin values.\
Use matplotlib - plt, to plot the values

In [None]:
# Line plot
x = np.linspace(0, 10, 100) # use NumPy to create an array called x,
#that contains 100 numbers that start at 0 and end at 10
y = np.sin(x)           # get the sine of each x
plt.plot(x, y)          # plot the x and y values
plt.title('Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()              # show the plt



We can also use NumPy to creat random numbers.\
And use plt to do a scatter diagram

In [None]:
# Scatter plot
x = np.random.rand(50)
y = np.random.rand(50)
plt.scatter(x, y)
plt.title('Random Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

## Seaborn
**Seaborn** is a Python visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.\

In [None]:
import seaborn as sns

In [None]:
tips = sns.load_dataset("tips") # load a file called tips
                                # This file is buit in so we can do some testing
print(tips)

In [None]:
# Scatter plot with Seaborn
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
plt.title('Total Bill vs Tip')
plt.xlabel('Total Bill')
plt.ylabel('Tip')
plt.show()

In [None]:
# Distribution plot
sns.histplot(data=tips, x="total_bill", kde=True)
plt.title('Distribution of Total Bill')
plt.xlabel('Total Bill')
plt.ylabel('Frequency')
plt.show()

I'm sure by now you can see that rather than having to create a program to do statistical math like Numpy, or do graphing and charting like Seaborn or matplotlib, it is much easier just to import the library.\
Trust me, we didn't even scrath the surface of the power of these librarys but then again you will probably never need to know or use all of the features. 

## Pandas
**Pandas** is a fast, powerful, and flexible open-source data analysis and manipulation library built on top of Python.
When dealing with large files like are used for image classification (Think jetbot collision avoidance) pandas is a great option.

In [None]:
import pandas as pd
print("import pandas as pd")

In [None]:
# Creating a DataFrame
# A dataframe is something like an excel spread sheet
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data) # rename the dataframe df for short
print("DataFrame:")
print(df)

pandas has built in functions to easily find the mean and max, to sort data, and to group that data.

In [None]:
# Basic DataFrame operations
print("Mean Salary:", df['Salary'].mean()) 
print("Maximum Age:", df['Age'].max())


In [None]:
# Filtering data
print("Employees older than 30:")
print(df[df['Age'] > 30])




In [None]:
# Grouping and aggregation
print("Average salary by age:")
print(df.groupby('Age')['Salary'].mean())

In [None]:
# Data visualization with Pandas
df.plot(kind='bar', x='Name', y='Salary', title='Employee Salaries')
plt.xlabel('Name')
plt.ylabel('Salary')
plt.show()

## 1 NumPy Exercises
Create a NumPy array with random integers and perform basic statistical operations on it.
It should look something like this when completed.\
Your numbers will vary.\
Random Array: [ 2 41 92 23  5 32 95 52 16 55]\
Sum: 413\
Mean: 41.3\
Max: 95

In [None]:
import numpy as np
random_array = np.random.randint(start, stop, how many)
print("Random Array:", random_array)
print("Sum:", np.sum(random_array))
# Print the mean
# Print the max

## 2 Matplotlib Exercises
Plot the graph of a mathematical function (e.g., quadratic) using Matplotlib.\
It should look like this.\
![image.png](attachment:image.png)

In [None]:
#import matpltlib and name it plt
#Start = -5, stop =5, amount =100
x = np.linspace(start, stop, amount)
y = x ** 2
plt.plot(x, y)
plt.title('Quadratic Function')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

## 3 Seaborn Exercises
Load a dataset and create a box plot showing the distribution of a numerical variable.\
![image.png](attachment:image.png)

In [None]:
#import seaborn as sns
iris = sns.load_dataset("iris")
print(iris)
# use this for line 5: data = iris, x='species', y='petal_length'
sns.boxplot(data=, x=, y=)
plt.title('Petal Length Distribution by Species')
plt.xlabel('Species')
plt.ylabel('Petal Length')
plt.show()

## 4 Pandas Exercises
Read a CSV file into a DataFrame and display the first few rows.\
You will need to uncomment row 3 and row 13.\
Your output should look like this.\
![image.png](attachment:image.png)

In [None]:
# import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')
print("First Few Rows of DataFrame:")
# print(df.head()) 

# Perform data cleaning operations such as handling missing values or removing duplicates from a DataFrame.
# Here we'll remove duplicates
df_cleaned = df.drop_duplicates()
print("Original DataFrame Shape:", df.shape)
print("DataFrame Shape after Removing Duplicates:", df_cleaned.shape)

# Calculate summary statistics (e.g., mean, median, standard deviation) for numerical columns in a DataFrame.
print("Summary Statistics:")
# print(df.describe())