# Tutorial 1 (Introduction to AI)

## Introduction to Python and AI

In the "Principles of AI" module, we will use **Python** as a programming language since it has an intuitive syntax, basic control flow, and data structures. It also supports interpretive run-time without standard compiler languages. This makes Python especially useful for prototyping algorithms for AI.


Python comes with a huge amount of inbuilt libraries and many of the libraries are for Artificial Intelligence. Some of the libraries are Tensorflow (which is high-level neural network library), Keras, PyTorch, Scikit-learn, CNTK, Theano. The list keeps going and never ends.

In this module we will use the following software:
    
* **Python** - The programming language.
* **Pandas** - Allows for data preprocessing.  Tutorial [here](http://pandas.pydata.org/pandas-docs/version/0.18.1/tutorials.html)
* **Scikit-Learn** - Machine learning framework for Python.  Tutorial [here](http://scikit-learn.org/stable/tutorial/basic/tutorial.html).

and potentially:

* **Keras** - [Keras](https://github.com/fchollet/keras) is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. 
* **TensorFlow** - Google's deep learning framework, must have the version specified above. 



## Code and Development in Python

Tutorials for this module will be presented as **Jupyter Notebooks**.  For your own development work you should use the **Spyder** IDE.  Both of these come with the **Anaconda** distribution and you should install this on your own machines.

# Installing Anaconda

Go to the download page for Anaconda, https://www.anaconda.com/distribution/ and follow the instructions.

After installation, Jupyter Notebooks can be launched from the menu, or by the command in the console:

`jupyter notebook`

You might need to specify a folder on launch.  For example:

`jupyter notebook --notebook-dir D:/my_works/`

The page should open in the browser in a few seconds (or you can go to http://localhost:8888/ - usually a local "server" of Jupyter Notebook is there.

The following packages are needed for this module:

```
conda install scipy
pip install --upgrade sklearn
pip install --upgrade pandas
pip install --upgrade pandas-datareader
pip install --upgrade matplotlib
pip install --upgrade pillow
pip install --upgrade requests
pip install --upgrade h5py
pip install --upgrade tensorflow==1.12.0
pip install --upgrade keras==2.2.4
```


## Working with Spyder

To create a new file, select New File from the File menu, and rename as you wish.

- Spyder comes with helpful debugging tool, allowing you to step through your code, etc.
- Output is written to the console, and this might well be graphics and diagrams, as well as print statements.


##  Working with Jupyter Notebook

To create a new file, click New -> Python.

The notebook consists of cells which are text (Markdown) and code (Code). If run, the code cells of a notebook make up a single program.  You can select a cell type on the control panel.

Working with cells:
<ul>
    <li> Select a cell - click on it</li>
    <li> Editing - click on it twice</li>
    <li> Running the cell — `SHIFT+ENTER`or click on the button <button class='fa fa-play icon-play btn btn-xs btn-default'></button> on the panel</li>
    <li> Adding a new cell - click on the button <button class='fa fa-plus icon-plus btn btn-xs btn-default'></button> on the panel</li>
    <li> Deleting a cell - click on the button <button class='fa fa-cut icon-cut btn btn-xs btn-default'></button> on the panel</li>
    <li> Moving a cell - click on the vertical arrows</li>
</ul>

There are two commonly used versions of Python — **Python 2** и **Python 3**. These versions are quite similar but there are differences because of which they **are not compatible** - programs written in one version may not work in the other.

In **"Introduction to AI"** module we are using **Python 3**. 
 
The exact Python version is not so important but it should be >= 3.5.

**Note:** You can check your Python version at the command line by running `python --version`.

If you use any of the Linux distributions then Python is probably already installed. Try the following commands in the terminal to start interactive mode:

`python` or `python3` or `python2`

Exit: `Ctrl+D`

The mode of operation in which the code from main.py will run

`python main.py`

Help: **`help(X)`**, where `X` — is what help is needed for

Help exit: `q`.

In [7]:
# What version of Python do you have?
import sys

#import keras
import pandas as pd
import sklearn as sk
#import tensorflow as tf

#print(f"Tensor Flow Version: {tf.__version__}")
#print(f"Keras Version: {keras.__version__}")
#print()
print(f"Python {sys.version}")
print(f"Pandas {pd.__version__}")
print(f"Scikit-Learn {sk.__version__}")

Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
Pandas 1.3.3
Scikit-Learn 1.0


# What is the "Principles of AI" module about?

- The "Principles of AI" module introduces artificial intelligence and neural computing as both technical subjects and
as fields of intellectual activity. 


- We introduce neural computing as an alternative knowledge acquisition/representation paradigm, to explain its basic principles and to describe a range of neural computing techniques and their application areas.


- The focus of this module is deep learning, which is a very popular type of machine learning that is based upon the original neural networks popularized in the 1980's. There is very little difference between how a deep neural network is calculated compared with the original neural network. A deep neural network is nothing more than a neural network with many layers.  While we've always been able to create/calculate deep neural networks, we've lacked an effective means of training them.  Deep learning provides an efficient means to train deep neural networks.


If deep learning is a type of machine learning, this begs the question, "What is machine learning?"  The following diagram illustrates how machine learning differs from traditional software development.

![class_1_ml_vs_trad.png](attachment:class_1_ml_vs_trad.png)

* **Traditional Software Development** - Programmers create programs that specify how to transform input into the desired output.
* **Machine Learning** - Programmers create models that can learn to produce the desired output for given input. This learning fills the traditional role of the computer program. 

### Python basics

#### 1. Count to 10

Use a `for` loop and a `range`.

In [8]:
for x in range(1, 10):  # If you ever see xrange, you are in Python 2
    print(x)  # If you ever see print x (no parenthesis), you are in Python 2


1
2
3
4
5
6
7
8
9


#### 2. Printing Numbers and Strings

In [9]:
acc = 0
for x in range(1, 10):
    acc += x
    print(f"Adding {x}, sum so far is {acc}")

print(f"Final sum: {acc}")

Adding 1, sum so far is 1
Adding 2, sum so far is 3
Adding 3, sum so far is 6
Adding 4, sum so far is 10
Adding 5, sum so far is 15
Adding 6, sum so far is 21
Adding 7, sum so far is 28
Adding 8, sum so far is 36
Adding 9, sum so far is 45
Final sum: 45


#### 3.  Lists and Sets

In [10]:
c = ['a', 'b', 'c', 'd']
print(c)

['a', 'b', 'c', 'd']


In [11]:
# Iterate over a collection.
for s in c:
    print(s)

a
b
c
d


In [12]:
# Iterate over a collection, and know where your index.  (Python is zero-based!)
for i,c in enumerate(c):
    print(f"{i}:{c}")

0:a
1:b
2:c
3:d


In [13]:
# Manually add items, lists allow duplicates
c = []
c.append('a')
c.append('b')
c.append('c')
c.append('c')
print(c)

['a', 'b', 'c', 'c']


In [14]:
# Manually add items, sets do not allow duplicates
# Sets add, lists append.  I find this annoying.
c = set()
c.add('a')
c.add('b')
c.add('c')
c.add('c')
print(c)

{'c', 'b', 'a'}


In [15]:
# Insert
c = ['a', 'b', 'c']
c.insert(0, 'a0')
print(c)
# Remove
c.remove('b')
print(c)
# Remove at index
del c[0]
print(c)

['a0', 'a', 'b', 'c']
['a0', 'a', 'c']
['a', 'c']


#### 4.  Maps/Dictionaries/Hash Tables

In [16]:
d = {'name': "Richard", 'address':"7 Build"}
print(d)
print(d['name'])

if 'name' in d:
    print("Name is defined")

if 'age' in d:
    print("age defined")
else:
    print("age undefined")

{'name': 'Richard', 'address': '7 Build'}
Richard
Name is defined
age undefined


In [17]:
d = {'name': "Richard", 'address':"7 Build"}
# All of the keys
print(f"Key: {d.keys()}")

# All of the values
print(f"Values: {d.values()}")

Key: dict_keys(['name', 'address'])
Values: dict_values(['Richard', '7 Build'])


In [18]:
# Python list & map structures
customers = [
    {'name': 'Richard & Lesley Jonson', 'pets': ['Tor', 'Oscar', 'Coco']},
    {'name': 'Anna McCartney', 'pets': ['sam']},
    {'name': 'Alexa Mason'}
]

print(customers)

for customer in customers:
    print(f"{customer['name']}:{customer.get('pets', 'no pets')}")

[{'name': 'Richard & Lesley Jonson', 'pets': ['Tor', 'Oscar', 'Coco']}, {'name': 'Anna McCartney', 'pets': ['sam']}, {'name': 'Alexa Mason'}]
Richard & Lesley Jonson:['Tor', 'Oscar', 'Coco']
Anna McCartney:['sam']
Alexa Mason:no pets


# Python for AI

Pandas
======
[Pandas](http://pandas.pydata.org/) is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.  It is based on the [dataframe](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) concept.  For this module, Pandas will be the primary means by which data is manipulated in conjunction with neural networks.

The dataframe is a key component of Pandas.  We will use it to access the [auto-mpg dataset](https://archive.ics.uci.edu/ml/datasets/Auto+MPG).  This dataset can be found on the UCI machine learning repository.  For this module we will use a version of the Auto MPG dataset where column headers were added.

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition.  It contains data for 398 cars, including [mpg](https://en.wikipedia.org/wiki/Fuel_economy_in_automobiles), [cylinders](https://en.wikipedia.org/wiki/Cylinder_(engine)), [displacement](https://en.wikipedia.org/wiki/Engine_displacement), [horsepower](https://en.wikipedia.org/wiki/Horsepower) , weight, acceleration, model year, origin and the car's name.

The following code loads the MPG dataset into a dataframe:

In [None]:
import os
import pandas as pd

path = "."  #absolute or relative path to the folder containing the file. 
            #"." for current folder

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read)
print(df[0:5])

In [None]:
# Perform basic statistics on a dataframe.

import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])

# Strip non-numerics
df = df.select_dtypes(include=['int', 'float'])

headers = list(df.columns.values)
fields = []

for field in headers:
    fields.append({
        'name' : field,
        'mean': df[field].mean(),
        'var': df[field].var(),
        'sdev': df[field].std()
    })

for field in fields:
    print(field)

### Sorting and Shuffling Data frames

In [None]:
import os
import pandas as pd
import numpy as np

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
#np.random.seed(42) # Uncomment this line to get the same shuffle each time
df = df.reindex(np.random.permutation(df.index))
df.reset_index(inplace=True, drop=True)
df

In [None]:
import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
df = df.sort_values(by='name', ascending=True)
print(f"The first car is: {df['name'].iloc[0]}")
print(df[0:5])

### Saving a Data frame

In [None]:
import os
import pandas as pd
import numpy as np

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
filename_write = os.path.join(path, "auto-mpg-shuffle.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
df = df.reindex(np.random.permutation(df.index))
df.to_csv(filename_write, index=False) # Specify index = false to not write row numbers
print("Done")

### Dropping Fields

Some fields are of no value to the neural network and can be dropped.  The following code removes the name column from the MPG dataset.

In [None]:
import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])

print(f"Before drop: {df.columns}")
df.drop('name', 1, inplace=True)
print(f"After drop: {df.columns}")

### Calculated Fields

It is possible to add new fields to the dataframe that are calculated from the other fields.  We can create a new column that gives the weight in kilograms.  The equation to calculate a metric weight, given a weight in pounds is:

$ m_{(kg)} = m_{(lb)} \times 0.45359237 $

This can be used with the following Python code:

In [None]:
import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
df.insert(1, 'weight_kg', (df['weight'] * 0.45359237).astype(int))
df

### Field Transformation & Preprocessing

The data fed into a machine learning model rarely bares much similarity to the data that the data scientist originally received. One common transformation is to normalize the inputs.  A normalization allows numbers to be put in a standard form so that two values can easily be compared.  Consider if a friend told you that he received a $10 discount.  Is this a good deal?  Maybe.  But the value is not normalized.  If your friend purchased a car, then the discount is not that good.  If your friend purchased dinner, this is a very good discount!

Percentages are a very common form of normalization.  If your friend tells you they got 10% off, we know that this is a better discount than 5%.  It does not matter how much the purchase price was.  One very common machine learning normalization is the Z-Score:

$z = {x- \mu \over \sigma} $

To calculate the Z-Score you need to also calculate the mean($\mu$) and the standard deviation ($\sigma$).  The mean is calculated as follows:

$\mu = \bar{x} = \frac{x_1+x_2+\cdots +x_n}{n}$

The standard deviation is calculated as follows:

$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}, {\rm \ \ where\ \ } \mu = \frac{1}{N} \sum_{i=1}^N x_i$

The following Python code replaces the mpg with a z-score.  Cars with average MPG will be near zero, above zero is above average, and below zero is below average.  Z-Scores above/below -3/3 are very rare, these are outliers.

In [None]:
import os
import pandas as pd
from scipy.stats import zscore

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
df['mpg'] = zscore(df['mpg'])
df

### Missing Values

Missing values are a reality of machine learning.  Ideally every row of data will have values for all columns.  However, this is rarely the case.  Most of the values are present in the MPG database.  However, there are missing values in the horsepower column.  A common practice is to replace missing values with the median value for that column.  The median is calculated as described [here](https://www.mathsisfun.com/median.html).  The following code replaces any NA values in horsepower with the median:

In [None]:
import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
med = df['horsepower'].median()
df['horsepower'] = df['horsepower'].fillna(med)
# df = df.dropna() # you can also simply drop NA values
print(f"horsepower has na? {pd.isnull(df['horsepower']).values.any()}")

### Concatenating Rows and Columns

Rows and columns can be concatenated together to form new data frames.

In [None]:
# Create a new dataframe from name and horsepower

import os
import pandas as pd

path = "."

filename_read = os.path.join(path, "auto-mpg.csv")
df = pd.read_csv(filename_read, na_values=['NA', '?'])
col_horsepower = df['horsepower']
col_name = df['name']
result = pd.concat([col_name, col_horsepower], axis=1)
result

### Accessing Files directly

It is possible to access files directly, rather than using Pandas.  For class assignments you should use Pandas; however, direct access is possible.  Using the CSV package, you can read the files in, line-by-line and process them.  Accessing a file line-by-line can allow you to process very large files that would not fit into memory.  For the purposes of this class, all files will fit into memory, and you should use Pandas for all class assignments.  

In [None]:
# Read a raw text file (avoid this)
import codecs
import os

path = "."

# Always specify your encoding! There is no such thing as "its just a text file".
# See... http://www.joelonsoftware.com/articles/Unicode.html
# Also see... http://www.utf8everywhere.org/
encoding = 'utf-8'
filename = os.path.join(path, "auto-mpg.csv")

c = 0

with codecs.open(filename, "r", encoding) as fh:
    # Iterate over this line by line...
    for line in fh:
        c += 1 # Only the first 5 lines
        if c > 5:
            break
        print(line.strip())

In [None]:
# Read a CSV file
import codecs
import os
import csv

encoding = 'utf-8'
path = "."
filename = os.path.join(path, "auto-mpg.csv")

c = 0

with codecs.open(filename, "r", encoding) as fh:
    reader = csv.reader(fh)
    for row in reader:
        c += 1
        if c > 5: 
            break
        print(row)

In [None]:
# Read a CSV, symbolic headers
import codecs
import os
import csv

path = "."

encoding = 'utf-8'
filename = os.path.join(path, "auto-mpg.csv")

c = 0

with codecs.open(filename, "r", encoding) as fh:
    reader = csv.reader(fh)

    # Generate header index using comprehension.
    # Comprehension is cool, but not necessarily a beginners feature of Python.
    header_idx = {key: value for (value, key) in enumerate(next(reader))}

    for row in reader:
        c += 1
        if c > 5:
            break
        print(f"Car Name: {row[header_idx['name']]}")

In [None]:
# Read a CSV, manual stats
import codecs
import os
import csv
import math

path = "."

encoding = 'utf-8'
filename_read = os.path.join(path, "auto-mpg.csv")
filename_write = os.path.join(path, "auto-mpg-norm.csv")

c = 0

with codecs.open(filename_read, "r", encoding) as fh:
    reader = csv.reader(fh)

    # Generate header index using comprehension.
    # Comprehension is cool, but not necessarily a beginners feature of Python.
    header_idx = {key: value for (value, key) in enumerate(next(reader))}
    headers = header_idx.keys()

    fields = {key: value for (key, value) in [(key, {'count':0, 'sum':0, 'variance':0}) for key in headers]}

    # Pass 1, means
    row_count = 0
    for row in reader:
        row_count += 1
        for name in headers:
            try:
                value = float(row[header_idx[name]])
                field = fields[name]
                field['count'] += 1
                field['sum'] += value
            except ValueError:
                pass

    # Calculate means, toss sums (part of pass 1)
    for field in fields.values():
        # If 90% are not missing (or non-numeric) calculate a mean
        if (field['count'] / row_count) > 0.9:
            field['mean'] = field['sum'] / field['count']
            del field['sum']

    # Pass 2, standard deviation & variance
    fh.seek(0)
    for row in reader:
        for name in headers:
            try:
                value = float(row[header_idx[name]])
                field = fields[name]
                # If we failed to calculate a mean, no variance.
                if 'mean' in field:
                    field['variance'] += (value - field['mean'])**2
            except ValueError:
                pass

    # Calculate standard deviation, keep variance (part of pass 2)
    for field in fields.values():
        # If no variance, then no standard deviation
        if 'mean' in field:
            field['variance'] /= field['count']
            field['sdev'] = math.sqrt(field['variance'])
        else:
            del field['variance']

    # Print summary stats
    for key in sorted(fields.keys()):
        print(f"{key}:{fields[key]}")

# Exercises

For these exercises, you should use **Spyder** as your IDE.  


**Question 1** Write Python code to produce the small multiplication square below.  You might  want to refer to the documentation https://docs.python.org/3/index.html for print, and consider the use of the end argument, and the ljust method.   

<table>
<tr>
    <td>1</td> <td>2</td> <td>3</td>  <td>4</td>  <td>5</td>
</tr>
<tr>
    <td>2</td>  <td>4</td>  <td>6</td>  <td>8</td>  <td>10</td>
</tr>
<tr>
    <td>3</td>  <td>6</td>  <td>9</td>  <td>12</td> <td>15</td> 
</tr>
<tr>
    <td>4</td>  <td>8</td>  <td>12</td> <td>16</td> <td>20</td> 
</tr>
<tr>
    <td>5</td>  <td>10</td> <td>15</td> <td>20</td> <td>25</td> 
</tr>
</table>
<p></p>
Now adapt this so that when an even number is found, a 0 is printed.<br>
<table>
<tr>
    <td>1</td> <td>0</td> <td>3</td>  <td>0</td>  <td>5</td>
</tr>
<tr>
    <td>0</td>  <td>0</td>  <td>0</td>  <td>0</td>  <td>0</td>
</tr>
<tr>
    <td>3</td>  <td>0</td>  <td>9</td>  <td>0</td> <td>15</td> 
</tr>
<tr>
    <td>0</td>  <td>0</td>  <td>0</td> <td>0</td> <td>0</td> 
</tr>
<tr>
    <td>5</td>  <td>0</td> <td>15</td> <td>0</td> <td>25</td> 
</tr>
</table>

**Question 2** The iris dataset will recur in these tutorial.  Similarly to the examples in the tutorial above:

- Load the dataset
- Print the dataset
- Print the rows in the dataset where petal_w < 1.0
- Sort the dataset by sepal_l and print this
- Save the sorted dataset to a new file
- Calculate the variance of sepal_l

# Answers

**Q1(a)**: Print a small multiplication square

In [33]:
start = 1
end = 6

for row in range(start, end):
    for col in range(start, end):
        res = row * col
        print(res, end="\t")
    print()

1	2	3	4	5	
2	4	6	8	10	
3	6	9	12	15	
4	8	12	16	20	
5	10	15	20	25	


**Q1(b)**: Print a small multiplication square, even numbers floored to 0

In [34]:
start = 1
end = 6

for row in range(start, end):
    for col in range(start, end):
        res = row * col
        if res % 2 == 0:
            res = 0
        print(res, end="\t")
    print()

1	0	3	0	5	
0	0	0	0	0	
3	0	9	0	15	
0	0	0	0	0	
5	0	15	0	25	


**Q2**: [Iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)

In [44]:
from sklearn.datasets import load_iris
import pandas as pd 
import numpy as np

# Load the dataset
dataset = load_iris()

In [22]:
# Print the dataset
dataset

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [57]:
# Print the rows in the dataset where petal_w < 1.0
readings = dataset.data

for row in readings:
    for col in row:
        if col < 1:
            print(row)

[5.1 3.5 1.4 0.2]
[4.9 3.  1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5.  3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5.  3.4 1.5 0.2]
[4.4 2.9 1.4 0.2]
[4.9 3.1 1.5 0.1]
[5.4 3.7 1.5 0.2]
[4.8 3.4 1.6 0.2]
[4.8 3.  1.4 0.1]
[4.3 3.  1.1 0.1]
[5.8 4.  1.2 0.2]
[5.7 4.4 1.5 0.4]
[5.4 3.9 1.3 0.4]
[5.1 3.5 1.4 0.3]
[5.7 3.8 1.7 0.3]
[5.1 3.8 1.5 0.3]
[5.4 3.4 1.7 0.2]
[5.1 3.7 1.5 0.4]
[4.6 3.6 1.  0.2]
[5.1 3.3 1.7 0.5]
[4.8 3.4 1.9 0.2]
[5.  3.  1.6 0.2]
[5.  3.4 1.6 0.4]
[5.2 3.5 1.5 0.2]
[5.2 3.4 1.4 0.2]
[4.7 3.2 1.6 0.2]
[4.8 3.1 1.6 0.2]
[5.4 3.4 1.5 0.4]
[5.2 4.1 1.5 0.1]
[5.5 4.2 1.4 0.2]
[4.9 3.1 1.5 0.2]
[5.  3.2 1.2 0.2]
[5.5 3.5 1.3 0.2]
[4.9 3.6 1.4 0.1]
[4.4 3.  1.3 0.2]
[5.1 3.4 1.5 0.2]
[5.  3.5 1.3 0.3]
[4.5 2.3 1.3 0.3]
[4.4 3.2 1.3 0.2]
[5.  3.5 1.6 0.6]
[5.1 3.8 1.9 0.4]
[4.8 3.  1.4 0.3]
[5.1 3.8 1.6 0.2]
[4.6 3.2 1.4 0.2]
[5.3 3.7 1.5 0.2]
[5.  3.3 1.4 0.2]


In [49]:
dataset.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [64]:
# Sort the dataset by sepal_l and print this
sorted_dataset = np.array(readings)
sorted_dataset.sort(axis=0)
sorted_dataset

array([[4.3, 2. , 1. , 0.1],
       [4.4, 2.2, 1.1, 0.1],
       [4.4, 2.2, 1.2, 0.1],
       [4.4, 2.2, 1.2, 0.1],
       [4.5, 2.3, 1.3, 0.1],
       [4.6, 2.3, 1.3, 0.2],
       [4.6, 2.3, 1.3, 0.2],
       [4.6, 2.3, 1.3, 0.2],
       [4.6, 2.4, 1.3, 0.2],
       [4.7, 2.4, 1.3, 0.2],
       [4.7, 2.4, 1.3, 0.2],
       [4.8, 2.5, 1.4, 0.2],
       [4.8, 2.5, 1.4, 0.2],
       [4.8, 2.5, 1.4, 0.2],
       [4.8, 2.5, 1.4, 0.2],
       [4.8, 2.5, 1.4, 0.2],
       [4.9, 2.5, 1.4, 0.2],
       [4.9, 2.5, 1.4, 0.2],
       [4.9, 2.5, 1.4, 0.2],
       [4.9, 2.6, 1.4, 0.2],
       [4.9, 2.6, 1.4, 0.2],
       [4.9, 2.6, 1.4, 0.2],
       [5. , 2.6, 1.4, 0.2],
       [5. , 2.6, 1.4, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5. , 2.7, 1.5, 0.2],
       [5.1, 2.7, 1.5, 0.2],
       [5.1, 2.8, 1.5, 0.2],
       [5.1, 2

In [75]:
# Save the sorted dataset to a new file
with open("output.txt", "w") as txt_file:
    for line in sorted_dataset:
        txt_file.write(str(line) + "\n")

In [96]:
# Calculate the variance of sepal_l

sepal_var = np.var(sorted_dataset, axis=0, dtype='float64')
sepal_var[0]

0.6811222222222227