### Instructions

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concepts covered in the lessons:

  * Streamlit Framework I
  * Streamlit Framework II
  * Streamlit Widgets I

---

#### Getting Started:

1. Click on this link to open the Colab file for this project.

   https://colab.research.google.com/drive/14AXadNtfaPV0uCK7d89HHIFaEWXXvC1S

2. Create a duplicate copy of the Colab file as described below.

  - Click on the **File menu**. A new drop-down list will appear.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/0_file_menu.png' width=500>

  - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/1_create_colab_duplicate_copy.png' width=500>

3. After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_Project94** format.

4. Now, write your code in the prescribed code cells.


---

#### Problem Statement

In this project, you are going to create a Census Data Web app using the Streamlit framework to display the dataset in it's raw form.



---

### Dataset Description

The dataset includes 32561 instances with 14 features and 1 target column which can be briefed as:

|Field|Description|
|---:|:---|
|age|age of the person, Integer.|
|work-class| employment information about the individual, Categorical.|
|fnlwgt| unknown weights, Integer.|
|education| highest level of education obtained, Categorical.|
|education-years|number of years of education, Integer.|
|marital-status| marital status of the person, Categorical.|
|occupation|job title, Categorical.|
|relationship| individual relation in the family-like wife, husband, and so on, Categorical.|
|race|Categorical.|
|sex| gender, Male, or Female.|
|capital-gain| gain from sources other than salary/wages, Integer.|
|capital-loss| loss from sources other than salary/wages, Integer.|
|hours-per-week| hours worked per week, Integer.|
|native-country| name of the native country, Categorical.|
|income-group| annual income, Categorical,  **<=50k** or **>50k**.|


**Notes:**
1. The dataset has no header row for the column name. (Can add column names manually)
2. There are invalid values in the dataset marked as **"?"**.
3. As the information about **fnlwgt** is non-existent it can be removed before model training.
4. Take note of the **whitespaces (" ")**  throughout the dataset.



**Dataset Credits:** https://archive.ics.uci.edu/ml/datasets/adult

**Dataset Creator:**
```
Dua, D., & Graff, C.. (2017). UCI Machine Learning Repository.
```

---

### List of Activities

**Activity 1:** Create Python File for the Web App
  
**Activity 2:** Design the Web App to Display the Data

---

####Activity 1: Create Python File for the Web App


In this activity, you have to create a Python file `census_app.py` in Sublime editor and save it in the `Python_scripts` folder.

Copy the code given below in the `cenus_app.py` file. You are already aware of this code which creates a function that will load the data from the csv file.

**Dataset Link:** https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/adult.csv


**Note:** Do not run the code shown below. It will throw an error.


In [None]:
# Open Sublime text editor, create a new Python file, copy the following code in it and save it as 'census_app.py'.

# Import modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import streamlit as st


@st.cache()
def load_data():
	# Load the Adult Income dataset into DataFrame.

	df = pd.read_csv('https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/adult.csv', header=None)
	df.head()

	# Rename the column names in the DataFrame.

	# Create the list
	column_name =['age', 'workclass', 'fnlwgt', 'education', 'education-years', 'marital-status', 'occupation', 'relationship', 'race', 'gender','capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']

	# Rename the columns using 'rename()'
	for i in range(df.shape[1]):
	  df.rename(columns={i:column_name[i]},inplace=True)

	# Print the first five rows of the DataFrame
	df.head()

	# Replace the invalid values ' ?' with 'np.nan'.

	df['native-country'] = df['native-country'].replace(' ?',np.nan)
	df['workclass'] = df['workclass'].replace(' ?',np.nan)
	df['occupation'] = df['occupation'].replace(' ?',np.nan)

	# Delete the rows with invalid values and the column not required

	# Delete the rows with the 'dropna()' function
	df.dropna(inplace=True)

	# Delete the column with the 'drop()' function
	df.drop(columns='fnlwgt',axis=1,inplace=True)

	return df

census_df = load_data()

**After this step, the Python file should be created on the local system with the function to load the data from the csv data file.**

---

####Activity 2: Design the Web App

In this activity, you have to display the dataset in raw form using  `st.dataframe()` function.

Follow the steps given below:

1. Add a checkbox widget to display the census-data DataFrame only when this checkbox is clicked.


2. On the click of the checkbox,

  - Display a subheader with a label `"Census Data set"`.

  - Below the subheader, display the original Adult Income dataset.
  - Also display the number of rows and columns of the dataset.


In [None]:
# Write the code to design the web app
st.sidebar("Census data")
if st.sidebar.checkbox('Show raw data') :
  st.subheader('Census Data set')
  st.dataframe(census_df)
  st.write('The dataframe has ', census_df.shape[0], 'rows and ', census_df.shape[1], 'columns.')

**Note:** Perform the tasks in `census_app.py` Python file in **Sublime editor** and run your code using command prompt or terminal. Once you get the desired output, write the code in the code section given above.




---

**After this activity, the web app should be ready for show-case. In the next project you will add more visualisations to this web app.**

---

### Submitting the Project:

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, make sure that '**Anyone on the Internet with this link can view**' option is selected and then click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>

3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_Project94**) of the notebook will get copied.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.
   
   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_Project94** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>

----