# $Artificial Intelligence$

- is a field of computer science dedicated to creating machines that can **mimic human intelligence**
- or that can perform tasks that typically require human intelligence
- 
- 
- 
- This definition leaves scope for **traditional programming** to be called AI as well *ocassionally*.
- **Traditional Programming** - systems that follow simple `if-else` rules
- Simple if else machines can be called AI too - sometimes - occasionally - based on opinion 
- 
- 
- What absolutely WILL be AI: 
- Major giveaway in differentiating between AI vs Traditional Programming is:
    - **LEARNING** - does the system learn with time, and adapt to a new behaviour
    - **Recommendation Systems** - are software that recommend content to users e.g. instagram, youtube, spotify, Shopee, Tiktok, Netflix... 
        - Recommend to the user, what the user wants

- Another giveaway is:
    - **More data** you feed to a system, the **more increase** in its performance
- 
- 
- AUTOMATION - is just a new part of AI
    - it is not specifically AI
    - 

Categories of AI:
1. **Narrow AI** - (Weak AI)
- it can perform specified tasks and cannot function beyond the scope e.g. Netflix, Chatbots, FaceID, Youtube

2. **General AI** -  (Strong AI)
- that theoreticaly would possess human-like intelligence across most domains. A general AI can do all things like play chess, write books, even perform surgery just like a human being can if not better. 

Artificial General Intelligence (AGI) refers to a level of artificial intelligence that matches human intelligence in *understanding*, *learning*, and *problem-solving*. 

Unlike narrow AI, which excels at specific tasks like facial recognition or language translation, AGI possesses the ability to perform a wide range of cognitive tasks with adaptability and versatility.

![](https://assets.ibm.com/is/image/ibm/diagram-comparing-ai-ml-deep-learning-gen-ai:16x9?fmt=png-alpha&dpr=on%2C1.25&wid=960&hei=540)

![](https://cdn.accelebrate.com/images/misc/ai-ml-dl-venn-diagram.png)

Finance: Detecting fraudulent credit card transactions by spotting "weird" spending patterns.

Healthcare: Analyzing X-rays to identify early signs of diseases that might be subtle to the human eye.

Retail: Recommendation engines (like Amazon or Netflix) suggesting what you might want next.

AI Systems: Self-driving cars interpreting camera feeds to stay in their lane.

### **Machine Learning** 
- Part of AI or sub-field of AI, in which **machines learn from data**
- 
- Things we do in ML:
- You take 
    - Proccessed Data -------> Machine ------> `**Trained Machine**`______: Training a Model
    - input ------> `**Trained Machine**` -----> predictions_________________: Making predictions

### **The ML Lifecycle**
- as a series of steps that we follow to build a machine learning system:
1. **Problem Scoping**:
    - Define what problem we are solving e.g. predicting whether a customer will keep our service or leave it - Customer Churn
2. **Data Acquisition**:
    - Collect relevant data from various sources - website, database, API, spreadsheet, 
3. **Data Exploration** - **Exploratory Data Analysis**:
    - We examine the data - if necessary, we clean it - we analyse the data using visualizations
4. **Modeling**
    - Select the relevant model - specify its *relevant parameters* (switches that control the behaviour of a model)
5. **Training**
    - We *split the data* we have into two parts - training, other for testing - Also called the `Train-Test Split`
    - We fit the model with data - to train it
6. **Testing**
    - We calculate various performance metrics to test how well our model is doing ---- *model performance*
7. **Deployment**
    - Deploy the model somewhere people can use it e.g. create custom UI and deploy, AWS, Azure, GCP
- 
-------- if performance is great, we move onwards ------ otherwise we repeat from any previous step ------------------
- That is why the whole thing is called a LifeCYCLE
- resplit the data ---- tune the hyperparameters ---- EDA again and check if data is good ---- data cleaning ----- change the data ---- add more data



----- Take a small problem ----- Implement ML steps ------
- you might not understand everything, that's okay 
- We have a future class for supervised, unsupervised as well


----- how many people have heard about the dataset called, IRIS?

iris dataset | has 4 columns
-----------|-------
| 1. petal length | numerical data 
| 2. petal width | numerical data
| 3. sepal length | numerical data
| 4. sepal width | numerical data
| 5. species | string type data


#### **1. Problem Scoping**
- i have data related to a certain type of a flower
- the problem is different dimensions of the flower, refer to different species
- i need a machine that can predict which flower belongs to which specie, based on the flower dimensions - 4 dimensions mentioned
- input----> PL, PW, SL, SW
- prediction-----> species == setosa, virgininca, versicolor

#### **2. Data Acquisition**
- we have iris on various libraries as well
- download the dataset and import using pandas

In [93]:
# pip install scikit-learn

In [94]:
from sklearn.datasets import load_iris
# load_diabetes
# load_digits
# load_iris
# load_breast_cancer

iris = load_iris()

In [95]:
iris['data'] # column data
iris['target'] # species
iris.feature_names # names of data/input columns
iris.target_names # names in the target column --- classes in the target column

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [96]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [97]:
iris.target

iris.target_names[[0,0,0,0,1,1,1,1,2,2,2,0,2,1,2,0,1]]

array(['setosa', 'setosa', 'setosa', 'setosa', 'versicolor', 'versicolor',
       'versicolor', 'versicolor', 'virginica', 'virginica', 'virginica',
       'setosa', 'virginica', 'versicolor', 'virginica', 'setosa',
       'versicolor'], dtype='<U10')

In [98]:
species = iris.target
species

# iris.target_names[iris.target]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [102]:
import pandas as pd

irisdf = pd.DataFrame(iris.data, columns= iris.feature_names)

# irisdf['species'] = iris.target_names[iris.target]

irisdf['species'] = iris.target

#### **3. Data Exploration and EDA**
- Check basic information and statiscs 

In [104]:
irisdf

# encoded labels for species

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [106]:
irisdf.info()

irisdf.isna().sum()

irisdf.describe()

<class 'pandas.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   species            150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [109]:
print(iris.DESCR)

# we are implementing this on the sklearn bunch file, NOT the dataframe

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fis

In [None]:
# data visualizations as well, we do
# we already visualized iris dataset



#### **4. Modeling**
- modeling in Ml means choose a model
- choose its best parameters
    - data modeling is a concept in database, analytics etc