### Instructions

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concepts covered in the lesson:

  Multipage Streamlit App I

---

#### Getting Started:

1. Click on this link to open the Colab file for this project.

   https://colab.research.google.com/drive/1EXe-W2cfoypdHZ3L7DILHXDHYfnCje4N

2. Create a duplicate copy of the Colab file as described below.

  - Click on the **File menu**. A new drop-down list will appear.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/0_file_menu.png' width=500>

  - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/1_create_colab_duplicate_copy.png' width=500>

3. After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_Project97** format.

4. Now, write your code in the prescribed code cells.


---

#### Problem Statement

In this project, you are going to create a Multipage Data Visualization Web app using the Streamlit framework.

This web app will do the following:

- Display the name of the app and provide a data description on the home page.

- Create different types of charts or plots to find a pattern (if exists) in the data through another web page.



---

### Dataset Description

The dataset includes 32561 instances with 14 features and 1 target column which can be briefed as:

|Field|Description|
|---:|:---|
|age|age of the person, Integer.|
|work-class| employment information about the individual, Categorical.|
|fnlwgt| unknown weights, Integer.|
|education| highest level of education obtained, Categorical.|
|education-years|number of years of education, Integer.|
|marital-status| marital status of the person, Categorical.|
|occupation|job title, Categorical.|
|relationship| individual relation in the family-like wife, husband, and so on, Categorical.|
|race|Categorical.|
|sex| gender, Male, or Female.|
|capital-gain| gain from sources other than salary/wages, Integer.|
|capital-loss| loss from sources other than salary/wages, Integer.|
|hours-per-week| hours worked per week, Integer.|
|native-country| name of the native country, Categorical.|
|income-group| annual income, Categorical,  **<=50k** or **>50k**.|


**Notes:**
1. The dataset has no header row for the column name. (Can add column names manually)
2. There are invalid values in the dataset marked as **"?"**.
3. As the information about **fnlwgt** is non-existent it can be removed before model training.
4. Take note of the **whitespaces (" ")**  throughout the dataset.



**Dataset Credits:** https://archive.ics.uci.edu/ml/datasets/adult

**Citation:**
```
Dua, D., & Graff, C.. (2017). UCI Machine Learning Repository.
```

---

### List of Activities

**Activity 1:** Page Navigator
  
**Activity 2:** Home Page Configuration

**Activity 3:** Visualise Data Page Configuration

---

#### Creating Python File for the Census Visualisation Web App


In this activity, you have to create a Python file `census_main.py` in Sublime editor and save it in the `Python_scripts` folder.

Copy the code given below in the `census_main.py` file. You are already aware of this code which creates a function that will load the data from the csv file.

**Dataset Link:** https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/adult.csv

**Note:** Do not run the code shown below. It will throw an error.


In [None]:
# Open Sublime text editor, create a new Python file, copy the following code in it and save it as 'census_main.py'.

# Import Streamlit and other required modules
import numpy as np
import pandas as pd
import streamlit as st

@st.cache()
def load_data():
	# Load the Adult Income dataset into DataFrame.

	df = pd.read_csv('https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/adult.csv', header=None)
	df.head()

	# Rename the column names in the DataFrame using the list given above.

	# Create the list
	column_name =['age', 'workclass', 'fnlwgt', 'education', 'education-years', 'marital-status', 'occupation', 'relationship', 'race','gender','capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']

	# Rename the columns using 'rename()'
	for i in range(df.shape[1]):
	  df.rename(columns={i:column_name[i]},inplace=True)

	# Print the first five rows of the DataFrame
	df.head()

	# Replace the invalid values ' ?' with 'np.nan'.

	df['native-country'] = df['native-country'].replace(' ?',np.nan)
	df['workclass'] = df['workclass'].replace(' ?',np.nan)
	df['occupation'] = df['occupation'].replace(' ?',np.nan)

	# Delete the rows with invalid values and the column not required

	# Delete the rows with the 'dropna()' function
	df.dropna(inplace=True)

	# Delete the column with the 'drop()' function
	df.drop(columns='fnlwgt',axis=1,inplace=True)

	return df

census_df = load_data()

---

#### Activity 1: Page Navigator

In this activity, you need to add radio button widgets to navigate through the **Home** and **Visualise data** web pages in the web app as shown in the image below:

<img src="https://i.imgur.com/QKpXBDX.png">

You need to create two empty Python files that are `census_home.py` and `census_plots.py` inside the same folder that contains the `census_main.py`.

- When a user selects the `Home` option, the `census_home.py` script will be rendered which contains the code to view data, display the title and description of the Web App.

- When a user selects the `Visualise Data` option, the `census_plots.py` script will be rendered which contains the code to create different types of charts or plots to find a pattern (if exists) in the data.

To create this navigation bar, perform the following tasks:

1. Import the `streamlit module`, `census_home.py`, and `census_plots.py` files in the `census_main.py` respectively.

2. Create a dictionary, say `pages_dict`, with keys being the label to be displayed in the navigation bar and values being the name of Python script to be rendered:

  ```python
  pages_dict = {
                "Home:" census_home,
                "Visualise Data": census_plots
            }
  ```

4. Add a title in the sidebar with the label `'Navigation'`.

5. Add a radio button widget with the label `'Go to'` and options as keys of the `pages_dict` dictionary. Pass these keys in the form of a list or a tuple as the options to the radio button widget can only be provided in the form of a list or a tuple.

5. Store the current value of this radio button widget in a `user_choice` variable.

6. Obtain the corresponding value of the key stored in the `user_choice` variable by passing it to the `pages_dict` dictionary. Store the value obtained from the  dictionary in a variable, say `selected_page`. It will have any value amongst `census_home` or `census_plots`.

7. Call the user defined `app()` function using `selected_page` variable and pass `census_df` as input to the `app()` function.

**Note:** Do not forget to create two empty Python files i.e. `census_home.py` and `census_plots.py` inside the same folder that contains the `census_main.py` before running the `census_main.py` file. If you fail to do so, you will get a `ModuleNotFoundError` error.

In [None]:
# Create the Page Navigator for 'Home' and 'Visualise Data' web pages in 'census_main.py'
# Import 'census_home.py' and 'census_plots.py' .
import home
import view
# Adding a navigation in the sidebar using radio buttons

import streamlit as st

def app():
	st.header("Census Visualisation Web App")
	st.text("This web app allows a user to explore and Visualise the census data.")
# Create a dictionary.
pages_dict = {'Home':home,
              'Visualise Data': view}


# Add radio buttons in the sidebar for navigation and call the respective pages based on user selection.
st.sidebar.title('Navigation')
user_choice = st.sidebar.radio('Go to',tuple(pages_dict.keys()))
if user_choice == 'Home':
    home.app()
else:
    selected_col = pages_dict[user_choice]
    selected_col.app(census_df)

**Expected Output:**

<img src="https://s3-whjr-v2-prod-bucket.whjr.online/10bdaded-01e0-4e41-9a54-11183fe8a96f.PNG"/>

After this activity, the user must be able to navigate between Home page and Visualise Data page using the radio buttons in the sidebar.

---

#### Activity 2: Home Page Configuration

Open the blank `census_home.py` file that you had created in the previous activity. Create a function `app()` in this file with `census_df` as its input and perform the following tasks within this `app()` function:

1. Add the code to display and hide the entire dataset using `st.beta_expander()` and `st.dataframe()` widget.
2. Show dataset summary with the click of a checkbox.

In [None]:
# Show complete dataset and summary in 'census_home.py'
# Import necessary modules.
import streamlit as st

# Define a function 'app()' which accepts 'census_df' as an input.
def app(census_df):

    st.header("View Data")

    # Display dataset within beta_expander.
    with st.beta_expander("View Dataset"):
        st.table(census_df)

    st.subheader("Columns Description:")
    if st.checkbox("Show summary"):
        st.table(census_df.describe())

    # Show dataset summary on click of a checkbox.
    col1,col2,col3 = st.beta_columns(3)
    with col1:
        if st.checkbox("Show all columns name"):
            st.table(list(census_df.columns))

    with col2:
        if st.checkbox("Show columns data-types"):
            st.table(census_df.dtypes)

    with col3:
        if st.checkbox("Show all columns data"):
            column_data = st.selectbox(("Select columns"),tuple(census_df.columns)
            st.write(census_df[column_data])

**Expected Output:**


- View Full Dataset:
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/49dd9d04-63ae-422e-ab06-2064a2a8a2f6.PNG"/>

- Show summary:

  <img src="https://s3-whjr-v2-prod-bucket.whjr.online/a4549b4d-c349-46d3-9538-c522cda0dbbe.PNG"/>


After this activity, the home page of the web app will allow the user to view the complete dataset as well as view summary of the dataset.

---

####Activity 3: Visualise Data Page Configuration

Open the blank `census_plots.py` file that you had created in  **Activity 1: Page Navigator**. Create an `app()` function in this file with `census_df` as its input and perform the following tasks within this `app()` function:

1. Add the code to display the following charts/plots for the 'Visualise data' web page:

  - Chart/plot to display the distribution of records for the `income-group` feature.

  - Chart/plot to display the distribution of records for the `gender` feature.

  - Chart/plot to display the difference in the range of values for the `hours-per-week` feature for different income groups.

  - Chart/plot to display the difference in the range of values for the `hours-per-week` features for different gender groups.

  - Chart/plot to display the count of several records for unique `workclass` feature values for different income groups.




























**Steps to follow:**

1. Create a user-defined function, say `app()` with `census_df` as its input in `census_home.py` and `census_plots.py` Python files to perform their respective tasks.

2. Add the code to display the dataset and the code to view the **column names**, **column data-type**, individual **column data** and the **mean**, **median**, **quartile**, **standard deviation** values of the numeric columns of a dataset inside the `app()` function of the `census_home.py`.



3. Add the code to display the following charts/plots for the 'Visualise data' web page inside the `app()` function of the `census_plots.py`.

  - Chart/plot to display the distribution of records for the `income-group` feature.

  - Chart/plot to display the distribution of records for the `gender` feature.

  - Chart/plot to display the difference in the range of values for the `hours-per-week` feature for different income groups.

  - Chart/plot to display the difference in the range of values for the `hours-per-week` features for different gender groups.

  - Chart/plot to display the count of several records for unique `workclass` feature values for different income groups.





  


In [None]:
# Code for 'census_plots.py' file.
# Import necessary modules.
import streamlit as st
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Define a function 'app()' which accepts 'census_df' as an input.
ef app(census_df):
    st.title('Visualise Data')
    st.set_option('deprecation.showPyplotGlobalUse',False)

    # Add a multiselect in the sidebar with label 'Select the Charts/Plots:'
    # Store the current value of this widget in a variable 'plot_list'.
    st.header('Select the Charts/Plots:')
    plot_list = st.multiselect(('Count Plot','Pie Chart','Box Plot'))

    # Display count plot using seaborn module and 'st.pyplot()'
    if 'Count Plot' in plot_list:
        st.subheader('Count Plot')
        plt.figure(figsize=(9,5))
        plt.title('Count plot for distribution of records for unique workclass groups')
        sns.countplot(x='workclass', data=census_df)
        st.pyplot()

    # Display pie plot using matplotlib module and 'st.pyplot()'
    if 'Pie Chart' in plot_list :
        st.subheader('Pie Chart')
        plt.figure(figsize=(5,5))
        pie_data = st.selectbox('Select the column for pie chart', ('income', 'gender'))
        plt.title(f'Distribution of records for {pie_data}')
        plt.pie(census_df[pie_data].value_counts(), labels = census_df[pie_data].value_counts().index, autopct='%1.2f%%', startangle=30)
        st.pyplot()

    # Display box plot using matplotlib module and 'st.pyplot()'
    if 'Box Plot' in plot_list :
        st.subheader('Box Plot')
        plt.figure(figsize=(12,2))
        column = st.selectbox('Select the column for distribution of records in boxplot', ('income', 'gender'))
        plt.title(f'The distribution of records for {column} with hours-per-week')
        sns.boxplot(x='hours-per-week', y=census_df[column], data = census_df)
        st.pyplot()


**Expected Output:**

<img src='https://s3-whjr-v2-prod-bucket.whjr.online/2c7a39bb-a26c-4775-bf8c-785af26dd075.gif'>

After this activity, the user must be able to visualise the dataset using various charts and plots to find any patterns in the data (if there exists any).


---

### Submitting the Project:

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, make sure that '**Anyone on the Internet with this link can view**' option is selected and then click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>

3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_Project97**) of the notebook will get copied.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.
   
   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_Project97** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>

---