#### Python for Machine Learning

In [3]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('../Data/iris_data.csv') 
print(f"\nrows: {df.shape[0]}; cols: {df.shape[1]}\n")
print(df.species.value_counts())
df = df.round(2)
df = df[:2]._append(df[50:52])._append(df[-2:])
print("\nSample rows:")
df


rows: 150; cols: 5

species
Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: count, dtype: int64

Sample rows:


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
50,7.0,3.2,4.7,1.4,Iris-versicolor
51,6.4,3.2,4.5,1.5,Iris-versicolor
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [7]:
def highlight_cols(s):
    return f'background-color: {color}'

df.columns = ['X1', 'X2', 'X3', 'X4', 'Y']

color = 'lightgrey'
print("Independent Variables    Dependent")
print(" Feature Set X1 to X4       Target")

df.style.applymap(highlight_cols, subset=pd.IndexSlice[:, ['Y']])
df

Independent Variables    Dependent
 Feature Set X1 to X4       Target


Unnamed: 0,X1,X2,X3,X4,Y
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
50,7.0,3.2,4.7,1.4,Iris-versicolor
51,6.4,3.2,4.5,1.5,Iris-versicolor
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


#### Data for Unsupervised Learning   
Only features are input. No labels.  

In [9]:
df[['X1', 'X2', 'X3', 'X4']]

Unnamed: 0,X1,X2,X3,X4
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
50,7.0,3.2,4.7,1.4
51,6.4,3.2,4.5,1.5
148,6.2,3.4,5.4,2.3
149,5.9,3.0,5.1,1.8


#### Python for Machine Learning   
- It has a large number of libraries and tools for machine learning.  
    Some of the most popular ones are scikit-learn, TensorFlow, Keras, and PyTorch.   
    These libraries provide a range of pre-built models and algorithms that can be easily integrated into  machine learning projects.  
- Being a general-purpose language, it can be used for a wide range of applications beyond just machine learning.     
##### Data pre-processing and feature selection:  
Before training a model, it's important to prepare the data.  
This includes cleaning, transforming, and normalizing the data to remove inconsistencies   
and ensure that it is suitable for the machine learning model.  
**Feature selection** involves selecting the most relevant features of the data to use for training the model.  
This is important because using all the features can lead to overfitting, which reduces the accuracy of the model.  
##### Model selection, training, and evaluation:   
The choice of machine learning model depends on the problem being solved and the type of data being used.  
The selected model is trained on the data.  
During training, the model learns the patterns and relationships in the data that it will use to make predictions on new data.   

**Evaluation** process measures the performance of the trained model.  
This is done by testing the model on a separate set of data, known as the test set, that was not used during training.   
Common metrics used to evaluate the performance of a model include accuracy, precision, recall, and F1-score.  

By understanding these fundamental principles of machine learning in Python,  
you can start building your own machine learning models to solve real-world problems.  

##### Python Libraries for Machine Learning  
One of the most widely used libraries for machine learning in Python is scikit-learn.  
It provides a range of algorithms and tools for supervised and unsupervised learning,  
as well as for data pre-processing, model selection, and evaluation.   
Other popular Python libraries for machine learning include TensorFlow, PyTorch, Keras, and Theano.  
Each of these libraries has its own strengths and weaknesses,   
and the choice of library depends on the specific requirements of the machine learning project.   

![](../Figures/ML_Tools.PNG)  

##### NumPy  
1. Fundamental library for numerical computing.  
2. Provides efficient data structures for large multi-dimensional arrays and matrices.  
3. Offers a wide range of mathematical functions and operations on arrays.  
4. Enables seamless integration with other libraries and tools for data manipulation and analysis.   
##### Scikit-Learn  
1. Offers easy-to-use and efficient implementation of various classification, regression, clustering, and dimensionality reduction techniques.  
2. Supports data preprocessing, feature selection, and model evaluation.  
3. Integrates well with other Python libraries and frameworks.  
##### Keras  
1. Built on top of TensorFlow, it simplifies creating and training deep learning models.  
2. Has user-friendly API, intuitive interface, and pre-built neural network components.  
3. Supports both convolutional and recurrent neural networks.  
4. Provides a flexible backend that can run on top of TensorFlow, Theano, or CNTK.  


##### Machine Learning is Complex   
Scientists from the University of Cambridge and Johns Hopkins University have   
successfully mapped every neuron and connection within the brain of a fruit fly larva.  
The process took 12 years.  
The fruit fly's brain is remarkably complex, with 3,016 neurons and 548,000 connections between them.  
The scientists identified 93 distinct neurons that varied in shape, function, and neurological connection.   
Although fruit flies have vastly different brains from humans, they share a basic biology and genetic foundation,  
making the discovery a good starting point for exploring the mysteries of the human brain.  - 13 Mar 2023.

##### Real-World Applications of Machine Learning Tools in Python  
Python's machine learning tools are widely used in many industries, including finance, healthcare, marketing, and manufacturing.   
E.g.,  
- in finance, ML is used for fraud detection, credit risk assessment, and algorithmic trading.  
- In healthcare, ML is used for disease diagnosis and drug discovery.   
- In marketing, ML is used for customer segmentation and personalized recommendations.   
- In manufacturing, ML is used for quality control and predictive maintenance.   

"_Oticon has introduced its newest hearing aid device, Oticon More™, the first **hearing aid with an on-board deep neural network**. Oticon More was trained—using 12 million-plus real-life sounds—so that people wearing it can better understand speech and the sounds around them._"

#### Data Sets  
##### Kaggle   
- Many data sets used in published papers on the Internet are from Kaggle.  
- Kaggle also has several  pretrained models. 

"Pre-trained models are an important part of modern ML workflows, but it’s not always easy to find the best one for your needs. With the launch of Kaggle Models, we’re making thousands of models from Google Research, Deepmind, and other sources more accessible to the ML developer community alongside our ML resource hub of 200k+ public datasets. To showcase how to use Kaggle Models to improve ML projects, we’ll demo how to use an open source model from Kaggle and use it to submit to a competition."  

<u>Popular Datasets</u>:   
Iris, Penguin, Titanic, Tips, Housing Prices, Bicycle Usage, Wine Quality, Breast Cancer,        
MNIST (Modified National Institute of Standards and Technology) database   

<u>Public Data</u>:   
Government, Research Institutes, NGOs, Organisations   

#### [Akin's Laws of Spacecraft Design](https://spacecraft.ssl.umd.edu/akins_laws.html)   
#1. Engineering is done with numbers. <u>Analysis without numbers is only an opinion</u>.   
#9. Not having all the information you need is never a satisfactory excuse for not starting the analysis.   

#21. (Larrabee's Law) Half of everything you hear in a classroom is crap. Education is figuring out which half is which.  
#43. You really understand something the third time you see it (or the first time you teach it.)

#### Basics of Scikit-Learn API   

1. Choose a class of model by importing the appropriate estimator class from Scikit-Learn.
2. Choose model hyperparameters by instantiating this class with desired values.
3. Arrange data into a features matrix and target vector following the discussion above.
4. Fit the model to your data by calling the fit() method of the model instance.
5. Apply the Model to new data:
    - For supervised learning, often we predict labels for unknown data using the predict() method.
    - For unsupervised learning, we often transform or infer properties of the data using the transform() or predict() method.