# INFOSCI 2950: Final Project Phase II Submission

---
## Research Questions

**Main Goal: How can we define Asian-American communities in terms of health, happiness, and financial security?**

*Financial Security*
1. Do some ethnic groups of Asian-Americans in Texas earn more than others?
2. Do younger Asian-Americans earn more money than older generations of Asian Americans living in Texas?
3. How does household size relate to duration of residency for AAs in Texas?
*Happiness*
1. Are some ethnic groups more likely to be born in the United States than others?
2. What are the most dominant religions among ethnic groups in Asian-American communities in Texas?
3. How is English speaking ability related to English difficulty among AA communities in Texas?
4. Are certain religious groups within AAs more likely to experience discrimination than other relgious groups within AAs in Texas?
5. Are certain ethnic groups within AAs more likely to experience discrimination than other ethnic groups within AAs in Texas?
*Health*
1. Is smoking among Asian-American populations linked to heart disease?
2. Are older generations of AAs more likely to follow a healthy diet than younger generations of AAs in Texas?



---

## Data Collection and Cleaning
### Data Collection
1. Go to data [landing page](https://data.austintexas.gov/City-Government/Final-Report-of-the-Asian-American-Quality-of-Life/hc5t-p62z). 
2. Click on Export > CSV.
3. Download publicly available `Final_Report_of_the_Asian_American_Quality_of_Life__AAQoL_.csv` into desired directory.

### Data Cleaning
1. Store raw data into a preliminary dataframe, `df`
2. Convert column names into snake_case

In [None]:
## <TO-DO> Paste affliated code here!



In [11]:
## load libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [13]:
## Step 1
df = pd.read_csv('Final_Report_of_the_Asian_American_Quality_of_Life__AAQoL_.csv')

In [15]:
## Step 2
new_colnames = [i.lower() for i in df.columns]
new_colnames = [i.replace(" ","_") for i in new_colnames]

#### Column Names
3. Save these new column names to a new dataframe, `asian`
4. Select columns from the existing `asian` dataframe for data analysis
- [ ] Income
- [ ] Retired
- [ ] US Born
- [ ] English Speaking
- [ ] English Difficulties
- [ ] Ethnicity
- [ ] Age
- [ ] Regular Exercise
- [ ] Healthy Diet
- [ ] Heart Disease
- [ ] Drinking
- [ ] Smoking
- [ ] Cancer
- [ ] Health Insurance
- [ ] Physical Check-up
- [ ] Quality of Life
- [ ] Religion
- [ ] Gender
- [ ] Close Friends
- [ ] Discrimination
- [ ] Duration of Residency
- [ ] Household Size
- [ ] Education Completed
- [ ] Cleaning Entries

In [None]:
## <TO-DO> Paste affliated code here!



In [19]:
## Step 3
asian = df.copy()
asian.columns = new_colnames

In [20]:
## Step 4
asian = asian[['income', 'retired', 'us_born', 'english_speaking', 'english_difficulties', 'ethnicity','age', 'regular_exercise', 'healthy_diet', 'heart_disease', 'drinking', 'smoking',
               'cancer', 'health_insurance', 'physical_check-up', 'quality_of_life', 'religion', 'gender', 'close_friends', 'discrimination_', 'duration_of_residency', 'household_size',
              'education_completed']]
# Additional improvements to asian
asian = asian.rename(columns = {'discrimination_':'discrimination', 'physical_check-up':'physical_checkup'})

#### Cleaning Entries
5. For the corresponding columns, convert null data according to the table below:

| Column                | Modification to NaNs           |
|-----------------------|--------------------------------|
| Duration of residency | -1                             |
| Education completed   | -1                             |
| Discrimination        | 0                              |
| Household size        | 0                              |
| Quality of Life       | middle value for responses (5) |
| Gender                | "Unknown"                      |
| Ethnicity             | "Unknown"                      |
| Religion              | "Unknown"                      |
| Age                   | median age                     |
| Close Friends         | median number of friends       |
| Income                | median income bracket          |

In [None]:
## <TO-DO> Paste affliated code here!



In [None]:
## Step 5

#changing english_speaking column to be represented by floats
asian['english_speaking'].replace({'Not at all': 1, 'Not well': 2, 'Well': 3, 'Very well': 4}, inplace = True)
asian['english_speaking'] = asian['english_speaking'].astype(float, errors = 'raise')
#replacing NaNs with 0
asian['english_speaking'] = asian['english_speaking'].fillna(0)
    
# changing english_difficulties column to be represented by floats
asian['english_difficulties'].replace({'Not at all': 1, 'Not much': 2, 'Much': 3, 'Very much': 4}, inplace = True)
#replacing NaNs with 0
asian['english_difficulties'] = asian['english_difficulties'].fillna(0)

# changing retired column to to be represented by floats
asian["retired"].replace({"Retired": 1}, inplace=True)
asian["retired"] = asian["retired"].astype(float, errors='raise')
#replacing NaNs with 0
asian['retired'] = asian['retired'].fillna(0)

# changing us_born column to to be represented by floats
asian["us_born"].replace({"No": 0, "Yes": 1}, inplace=True)
#replacing NaNs with 0
asian['us_born'] = asian['us_born'].fillna(0)

#replacing NaNs with 0
asian['health_insurance'] = asian['health_insurance'].fillna(0)
# changing health_insurance column 
asian['health_insurance'].replace({"Yes": "1"}, inplace=True)

#replacing NaNs with 0
asian['physical_checkup'] = asian['physical_checkup'].fillna(0)

# chaning physical_checkup column 
asian["physical_checkup"].replace({"Yes": "1"}, inplace=True)

#replacing NaNs with 0
asian['us_born'] = asian['us_born'].fillna(0)

# changing regular_exercise column to 0 and 1
#replacing NaNs with 0
asian['regular_exercise'] = asian['regular_exercise'].fillna(0)

#replacing NaNs with 0
asian['healthy_diet'] = asian['healthy_diet'].fillna(0)

#replacing NaNs with 0
asian['heart_disease'] = asian['heart_disease'].fillna(0)

#replacing NaNs with 0
asian['healthy_diet'] = asian['healthy_diet'].fillna(0)

#replacing NaNs with 5.0
asian['quality_of_life'] = asian['quality_of_life'].fillna(5.0)


#replacing NaNs with 0
asian['discrimination'] = asian['discrimination'].fillna(0)


#replacing NaNs with -1
asian['duration_of_residency'] = asian['duration_of_residency'].fillna(-1)


#replacing NaNs with 0 because househols_siz can not be 0 because participants are counting themselves
asian['household_size'] = asian['household_size'].fillna(0)


#replacing NaNs with -1
asian['education_completed'] = asian['education_completed'].fillna(0)


#replacing NaNs with Unknown
asian['gender'] = asian['gender'].fillna('Unknown')

#replacing NaNs with Unknown
asian['ethnicity'] = asian['ethnicity'].fillna('Unknown')

#replacing NaNs with Unknown
asian['religion'] = asian['religion'].fillna('Unknown')

#replacing NaNs with median age
asian['age'] = asian['age'].fillna(40.0)

#replacing NaNs with median number of close friends
asian['close_friends'] = asian['close_friends'].fillna(3.0)

asian = asian.dropna()
# asian = asian.reset_index

asian.head()

6. Replace column data to binary responses with 1s indicating ‘Yes’es and 0s indicating ‘No’es
7. Address `income` column’s formatting issues
8. Convert column data into intended data types
| Column                | Data Type                                                                          |
|-----------------------|------------------------------------------------------------------------------------|
| income                | String                                                                             |
| new_income            | int: {0 (for NaNs), 1 (0-9999), 2 (10000-19999), ..., 7 (60000-69999), 8 (70000+)} |
| retired               | int: {0 for Noes, 1 for Yeses}                                                     |
| US Born               | int: {0 for Noes, 1 for Yeses}                                                     |
| English Speaking      |                                                                                    |
| English Difficulties  |                                                                                    |
| Ethnicity             | String                                                                             |
| Age                   | int                                                                                |
| Regular Exercise      | int: {0 for Noes, 1 for Yeses}                                                     |
| Healthy Diet          | int: {0 for Noes, 1 for Yeses}                                                     |
| Heart Disease         | int: {0 for Noes, 1 for Yeses}                                                     |
| Drinking              | int: {0 for Noes, 1 for Yeses}                                                     |
| Smoking               | int: {0 for Noes, 1 for Yeses}                                                     |
| Cancer                | int: {0 for Noes, 1 for Yeses}                                                     |
| Health Insurance      | int: {0 for Noes, 1 for Yeses}                                                     |
| Physical Check-up     | int: {0 for Noes, 1 for Yeses}                                                     |
| Quality of Life       |                                                                                    |
| Religion              |                                                                                    |
| Gender                |                                                                                    |
| Close Friends         |                                                                                    |
| Discrimination        |                                                                                    |
| Duration of Residency |                                                                                    |
| Household Size        |                                                                                    |
| Education Completed   |                                                                                    |


In [None]:
## <TO-DO> Paste affliated code here!



---

## Data Description
The observations are the Asian-Americans surveyed in Austin, Texas. The attributes are a range of quality of life measurements. This dataset will help us understand the rapid growth rate and unique challenges as a new immigrant group calls for a better understanding of the social and health needs of the Asian American population. This dataset was funded by the City of Austin’s Public Information Office. Researchers and participants guided the observation of data. During the study, participants elevated their responses by adding “respect and appreciation of diverse cultures and acknowledgement of the legacy of the Asian community in Austin. The kind of data that was recorded focused on the project's goal to improve the city of Austin’s Asian American resources on health, housing,  culture, civic engagement, and economic development. The data came to formation through a three-year community engagement process. Commissioners, consultants and City staff worked with partner agencies and individual volunteers to meet Asian American community members where they live, work and play. More than 3,350 individuals took one of two surveys during the initiative either online or in-person at a variety of locations throughout the city, “Conversation Over Tea,” and other City of Austin hosted meetings to facilitate dialogue and share anecdotes about their lives. Additionally, “travel booths” were present at various events throughout the city engaging hundreds of Asian Americans. Participation included almost every ZIP code within the city and adjacent areas. The people involved were aware of the data collection. They were surprised to learn that the data was to be used for the City of Austin to hear about their dreams, challenges and contributions. A raw source to the data can be found [here](https://data.austintexas.gov/City-Government/Final-Report-of-the-Asian-American-Quality-of-Life/hc5t-p62z), under Export > CSV configurations.

---

## Data Limitations
Limited amount of quantitative data values will make it difficult to generate traditional-looking scatter plots. Questions asked to participants had responses of different scales. Data collected is limited to representing the attitudes of Asian-Americans living in Texas. Certain ethnic groups dominate over others which may lead to skewed results when extrapolating quality of life measurements. Some response variables will be affected by confounding variables such as different cultural aspects among sub-ethnic groups. (e.g. some ethnic groups have leaner diets which would impact conclusions drawn.)

---

## Exploratory Data Analysis
### Outline

#### Scatterplots
- [ ] Income vs Age
- [ ] Smoking vs Heart Disease
- [ ] English Speaking vs English difficulty
- [ ] Household Size vs Duration of Residency

#### Bar Charts
- [ ] Median Income Brackets per Ethnicity
- [ ] Percentage of US-born per Ethnicity
- [ ] Percentage of Religion per Ethnicity
- [ ] Percentage of Discrimination per Ethnicity
- [ ] Percentage of Discrimination per Religion

#### Histograms
- [ ] Age and Healthy Diet
  - Age ranges grouped by yes/healthy diet
  - Age ranges grouped by no/healthy diet

### Sample Summary Statistics

Median household income for Asian Americans living in Austin, Texas

In [None]:
## <TO-DO> Write affliated code here!



### Sample Relevant Plots

Distribution of ethnicities among Asian-Americans living in Austin, Texas

In [None]:
## <TO-DO> Write affliated code here!



Distribution of religious groups by ethnicity in Asian-American communities in Austin, Texas

In [None]:
## <TO-DO> Write affliated code here!



---

## Questions for Reviewers

