# **Principles of Data Analytics - Tasks**

### Authored by: Stephen Kerr

#### **Assessment Links**

- The Tasks Descriptions are outlined in the following link: [Assessment Description][def1]
- The Marking Scheme is outlined in the following link: [Assesment Marking Scheme][def2] 


[def1]: https://github.com/ianmcloughlin/principles_of_data_analytics/blob/main/assessment/tasks.md
[def2]: https://github.com/ianmcloughlin/principles_of_data_analytics/blob/main/assessment/instructions.md

## **Task 1: Source the Data Set**


### **Task 1 Description**

Import the Iris data set from the sklearn.datasets module.  
Explain, in your own words, what the load_iris() function returns.

### **Task 1 Submission:**

The **load_iris()** function loads the Iris dataset which is classic multi-class classification dataset.  
The dataset is imported as a *'Bunch'* object with the following attributes:  
- **'data'** which is the data matrix.
- **'target'** which is the classification target.
- **'feature_name'** which is a list of the dataset columns.
- **'target_names'** which is a list of the target classes.

The iris data was loaded with the parameter ***'as_frame'*** set as *True* resulting in:
- **'data'** attribute being a pandas Dataframe.
- **'target'** attribute is a pandas Series.

There is also an *additional attribute* when the load_iris() is loaded with the 'as_frame' = *True*, called **'frame'** which is a pandas DataFrame with the combination of data and target. 

---

## References: 

1. [![load_iris](https://tse4.mm.bing.net/th?id=OIP.Hf2oXZgEGL98vH30SEeZQQAAAA&pid=Api&P=0&h=180) Click the image to learn more about load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)
2. [Markdown Syntax Cheatsheet](https://www.markdown-cheatsheet.com/ "Here the title goes")

In [7]:
# Ojective: Import the Iris data set from sklearn.datasets module
# Author: Stephen Kerr

# import sklearn
import sklearn as skl

# Load the iris data set as 'data_bunch'.
# Note, used the parameter 'as_frame' = True to get the data in a pandas DataFrame.
iris_data_bunch = skl.datasets.load_iris(as_frame=True)

# printed the 'iris_data_bunch' bunch
print(iris_data_bunch)

{'data':      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns], 'target': 0      0
1      0
2 

## Task 2: Explore the Data Structure

### **Task 2 Description:** 

Print and explain the shape of the data set, the first and last 5 rows of the data, the feature names, and the target classes.

### **Task 2 Submission**



In [8]:
# Shape of iris dataset
print(f'The shape of the Iris Data Set is: '
      f'\n \t \t \t \t   Rows (instances) = {iris_data_bunch['frame'].shape[0]},'
      f'\n \t \t \t \t   Columns (features) = {iris_data_bunch['frame'].shape[1]},\n')

print(f'This means the Iris Data Set has Rows {iris_data_bunch['frame'].shape[0]} and' 
      f' {iris_data_bunch['frame'].shape[1]} Columns.' 
      f'\nOr {int(iris_data_bunch['frame'].shape[0]) * int(iris_data_bunch['frame'].shape[1])} Unique Data points.\n')


The shape of the Iris Data Set is: 
 	 	 	 	   Rows (instances) = 150,
 	 	 	 	   Columns (features) = 5,

This means the Iris Data Set has Rows 150 and 5 Columns.
Or 750 Unique Data points.



In [None]:
# The first 5 rows of the Iris Data Set using the .head() Pandas method: 
print(iris_data_bunch['frame'].head(5))

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  


In [19]:
# The last 5 rows of the Iris Data Set using the .tail() Pandas method:
print(iris_data_bunch['frame'].tail(5))

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

     target  
145       2  
146       2  
147       2  
148       2  
149       2  


In [None]:
# The feature names are:

print('The follwing is all the feature names for the Iris Data Set:')
for n, i in enumerate(iris_data_bunch['feature_names'], start=1):
    print(f'\t {n}. {i}')

The follwing is all the feature names for the Iris Data Set:
	 1. sepal length (cm)
	 2. sepal width (cm)
	 3. petal length (cm)
	 4. petal width (cm)


In [None]:
# The target classes

print('The follwing is all the target names for the Iris Data Set:')
for n,i in enumerate(iris_data_bunch['target_names'], start=1):
    print(f'\t {n}. {i}')

The follwing is all the target names for the Iris Data Set:
	 1. setosa
	 2. versicolor
	 3. virginica


## Task 3: Summarize the Data

### Task Description: 

XYZ

## Task 4: Visualize Features

### Task Description: 

XYZ

## Task 5: Investigate Relationships

### Task Description: 

XYZ

## Task 6: Analyze  Relationships

### Task Description: 

XYZ

## Task 7: Analyze  Class Distributions

### Task Description: 

XYZ

## Task 8: Compute Correlations

### Task Description: 

XYZ

## Task 9: Fit a Simple Linear Regression

### Task Description: 

XYZ

## Task 10: Too Many Features 

### Task Description: 

XYZ

# End