# Numpy

- NumPy, short for "Numerical Python," is a fundamental package for numerical computations in Python.
- It provides support for arrays, matrices, and various mathematical operations, making it a powerful tool for scientific computing.


NumPy Official Documentation: https://numpy.org/doc/stable/

NumPy Quickstart Tutorial: https://numpy.org/doc/stable/user/quickstart.html


---

To use NumPy, you need to install it. Open your terminal or command prompt and enter the following command:

In [1]:
!pip install numpy



In Python, you import the NumPy library using the **import** statement:

In [2]:
import numpy as np

### Creating Arrays
- The core data structure in NumPy is the ndarray (n-dimensional array).
- Arrays can be created from lists or tuples using the np.array() function.

In [3]:
# Create a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
print(arr_1d)

# Create a 2D array (matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)

[1 2 3 4 5]
[[1 2 3]
 [4 5 6]]


### Array Properties

- Arrays have attributes like shape, size, and dimensions.
- Access these attributes using the dot notation.

In [4]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Shape:", arr.shape)  # Shape of the array
print("Size:", arr.size)    # Number of elements
print("Dimensions:", arr.ndim)  # Number of dimensions

Shape: (2, 3)
Size: 6
Dimensions: 2


### Array Operations

- NumPy supports element-wise operations.
- Mathematical operations can be performed directly on arrays.

In [5]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Element-wise addition
result = arr1 + arr2
print("Result:", result)

# Element-wise multiplication
result = arr1 * arr2
print("Result:", result)

Result: [5 7 9]
Result: [ 4 10 18]


### Universal Functions (ufuncs)

- NumPy provides many universal functions for common mathematical operations.
- They operate element-wise on arrays and produce arrays as output.


In [6]:
arr = np.array([1, 2, 3])

# Square root
sqrt_arr = np.sqrt(arr)
print("Square Root:", sqrt_arr)

# Exponential
exp_arr = np.exp(arr)
print("Exponential:", exp_arr)

Square Root: [1.         1.41421356 1.73205081]
Exponential: [ 2.71828183  7.3890561  20.08553692]


### Array Indexing and Slicing

- You can access elements of an array using indexing and slicing.
- Remember that indexing starts from 0.


In [7]:
arr = np.array([10, 20, 30, 40, 50])

# Accessing individual elements
print("Element at index 2:", arr[2])

# Slicing to get a range of elements
print("Sliced array:", arr[1:4])  # Elements from index 1 to 3

Element at index 2: 30
Sliced array: [20 30 40]


# Pandas

- Pandas is a powerful library for data manipulation and analysis in Python.
- It provides tools to work with structured data like tables, making it essential for data preprocessing and analysis.
- Offers data structures (Series and DataFrame) to handle tabular data efficiently.

Pandas Official Documentation: https://pandas.pydata.org/docs/

Pandas Tutorials: https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html


---

To use Pandas, install it and import as with the following

In [8]:
!pip install pandas



In [9]:
import pandas as pd

### Series - One-Dimensional Data

- A Series is a one-dimensional array-like object containing data and an associated index.
- Created using the pd.Series() function.


In [10]:
# Create a Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)

0    10
1    20
2    30
3    40
4    50
dtype: int64


### DataFrame - Two-Dimensional Data

- A DataFrame is a 2D labeled data structure with columns that can hold different data types.
- Created using the pd.DataFrame() function.


In [15]:
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22]}
data = pd.DataFrame(data)
print(data)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22


### Reading Data

- Pandas can read data from various file formats using functions like pd.read_csv(), pd.read_excel(), and more.

- *** MAKE SURE TO HAVE A FILE IN THE DIRECTORY FIRST ***

In [None]:
# Read data from a CSV file
data = pd.read_csv('data.csv')

### Data Exploration

- Pandas provides methods to explore and summarize data.


In [17]:
# Display the first few rows
print(data.head())

# Display basic statistics
print(data.describe())

# Count unique values in a column
print(data['Name'].value_counts())

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22
             Age
count   3.000000
mean   25.666667
std     4.041452
min    22.000000
25%    23.500000
50%    25.000000
75%    27.500000
max    30.000000
Alice      1
Bob        1
Charlie    1
Name: Name, dtype: int64


### Data Selection and Filtering

- You can select columns, filter rows, and perform conditional operations on data.

In [21]:
# Select a specific column
column = data['Age']
print(column)

# Filter rows based on a condition
filtered_data = data[data['Age'] > 25]
print("People older than 25 \n", filtered_data)

0    25
1    30
2    22
Name: Age, dtype: int64
People older than 25 
   Name  Age
1  Bob   30


# Scikit-Learn

- Scikit-Learn simplifies the process of implementing machine learning algorithms.
- Offers a consistent API for various algorithms and tasks.
- Provides tools for data preprocessing, feature selection, and model evaluation.

---

Install and import scikit-learn

In [22]:
! pip install scikit-learn



In [23]:
import sklearn

## The Scikit-Learn Workflow

- Scikit-Learn follows a consistent workflow for machine learning tasks.
- Split your data into features (X) and target (y).
- Import a model class, create an instance, and train it.
- Make predictions and evaluate the model's performance.


### Classification Example

In [24]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a k-nearest neighbors classifier
clf = KNeighborsClassifier(n_neighbors=3)

# Train the classifier
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


### Regression Example

In [26]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the Diabetes dataset
data = load_diabetes()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
reg = LinearRegression()

# Train the model
reg.fit(X_train, y_train)

# Make predictions
y_pred = reg.predict(X_test)

# Evaluate mean squared error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)


Mean Squared Error: 2900.193628493482
