<a href="https://colab.research.google.com/github/gtardy-iu/teach/blob/main/iris.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analyzing the Iris Dataset with Python's Built-in Types and Functions

This notebook focuses on analyzing the Iris dataset using Python's built-in types and functions. You will work with lists, tuples, and strings to extract meaningful insights from the data.

Associated Web Page

- [Analyzing 2D Data](https://www.jdatalab.com/pie/py2/lab-analyze-iris.html)

## Open this notebook in Colab

One way to run this notebook is to open it in [Google Colab](https://colab.research.google.com/), which provides a free online environment with Python and Jupyter notebooks.

To open this notebook in Colab, you can follow these steps to access private GitHub repositories:
-   Login to your Google account
-   Navigate to [colab.research.google.com/github](https://colab.research.google.com/github/)
-   Check the `Include Private Repos` box.
-   Sign in to your GitHub account and authorize Colab to access your private repos.
-   Your private repositories and notebooks will now be available in the GitHub navigation pane.
- Select the repository and then the notebook you want to open.

## (Simple Way) Alternatively, you can use the badge below to open this notebook directly in Colab:

- Simply click the `Open in Colab` button below to open this notebook in Google Colab.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nzGTrFSF6ZN5Vu6jQyxOhE2xy6FUZbXc?usp=drive_link)

The button above opens this notebook on a Colab page, where you can edit and run the notebook cells interactively.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:

# For Colab User only:
# When open notebook in Colab, you can run this cell to get iris.csv from Google Drive

# If you are using VS Code, you can ignore this cell.

!gdown 1Sn8L6B7CxFc8Sp2k2y3R7-aNyTErhG_f

# after run the cell, find iris.csv in Files on left column bar
# double click to view file. It has 150 entries of 5 columns each.

Downloading...
From: https://drive.google.com/uc?id=1Sn8L6B7CxFc8Sp2k2y3R7-aNyTErhG_f
To: /content/iris.csv
  0% 0.00/3.98k [00:00<?, ?B/s]100% 3.98k/3.98k [00:00<00:00, 15.2MB/s]


In [5]:
# 0. load dependencies
import os, csv
# pre-installed in colab (pip install matlplotlib)
import matplotlib.pyplot as plt

In [6]:
# 1. read data
iris = []

with open('iris.csv', 'r', encoding='UTF-8') as f:
    iris = [
        #parse each value into one item in a tuple
        (
         float(row['sepal.length']),
         float(row['sepal.width']),
         float(row['petal.length']),
         float(row['petal.width']),
         row['variety'].lower()
        )
        for row in csv.DictReader(f)
    ] # list comprehension

In [7]:
# 2. inspect data
print(iris[0:5]) # slice operator to print the header of 5 rows

# practice slice operator for 2d list
print(f'''
First row: {iris[0]}
First two rows: {iris[0:2]}
First cell: {iris[0][0]}
''')

# get 2nd column for sepal.width values
sepal_width = [ row[1] for row in iris]
print(f'sepal.width: {sepal_width}')

[(5.1, 3.5, 1.4, 0.2, 'setosa'), (4.9, 3.0, 1.4, 0.2, 'setosa'), (4.7, 3.2, 1.3, 0.2, 'setosa'), (4.6, 3.1, 1.5, 0.2, 'setosa'), (5.0, 3.6, 1.4, 0.2, 'setosa')]

First row: (5.1, 3.5, 1.4, 0.2, 'setosa')
First two rows: [(5.1, 3.5, 1.4, 0.2, 'setosa'), (4.9, 3.0, 1.4, 0.2, 'setosa')]
First cell: 5.1

sepal.width: [3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3, 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4,

In [None]:
# 3. Data size
row_count = len(iris)
col_count =

print(f'''
Number of iris samples: {row_count}
Number of variables/attributes: {col_count}
''')

In [None]:
# 4. Analyze sepal.length column
sepal_length = ...
#print(sepal_length)

s_max = ...
s_min = ...
s_avg = ...

print(f'''
sepal_length summary:
maximum : {s_max:.3f}
minimum : {s_min:.3f}
average : {s_avg:.3f}
''')

In [None]:
# 5. Search variety name(s) having max sepal_length
variety_with_max_sepal_length = ...

print(f'''
Variety name(s) having maximum sepal_length: {variety_with_max_sepal_length}
''')

In [None]:
# 6. Plot distribution of variety column in bar chart
# and save plot to file

# get variety column to a list
variety = ...

# configure distinct varieties in x, their frequency number in y
# x = set(variety)
x = ['setosa', 'versicolor','virginica']
y = [0] * len(x) # list of three zeros

# Write code here to populate y with frequency of each distinct variety
# Not Implemented
...
...

print(y)


In [None]:
#
plt.bar(x, y, color=('y','g','k'))
plt.title('Histogram of Iris Variety')
plt.xlabel('Variety')
plt.ylabel('Count/frequency')
#plt.show()
plt.savefig('iris_histogram.png')