## Inspiration Index

- Data Source:
  - `public_art`
  - `museums`
  - `libraries`
  - `parks_and_facilities`
  - `community_nonprofit_orgs`
  - `faith-based_facilities`

In [None]:
import pandas as pd
import numpy as np

# public art
public_art = pd.read_csv('../../data_cleaned/public_art.csv')
public_art_count = public_art['hood'].value_counts()
print("public_art:")
print("  mean:", public_art_count.mean())
print("  median:", public_art_count.median())
print("  max:", public_art_count.max())

# museums
museums = pd.read_csv('../../data_cleaned/museums.csv')
museums_count = museums['hood'].value_counts()
print("museums:")
print("  mean:", museums_count.mean())
print("  median:", museums_count.median())
print("  max:", museums_count.max())

# libraries
libraries = pd.read_csv('../../data_cleaned/libraries.csv')
libraries_count = libraries['hood'].value_counts()
print("libraries:")
print("  mean:", libraries_count.mean())
print("  median:", libraries_count.median())
print("  max:", libraries_count.max())

# parks
parks = pd.read_csv('../../data_cleaned/parks_and_facilities.csv')
parks_count = parks['hood'].value_counts()
print("parks:")
print("  mean:", parks_count.mean())
print("  median:", parks_count.median())
print("  max:", parks_count.max())

# nonprofit orgnizations
nonprofit_orgs = pd.read_csv('../../data_cleaned/community_nonprofit_orgs.csv')
nonprofit_orgs_count = nonprofit_orgs['hood'].value_counts()
print("nonprofit_orgs:")
print("  mean:", nonprofit_orgs_count.mean())
print("  median:", nonprofit_orgs_count.median())
print("  max:", nonprofit_orgs_count.max())

# faith-based facilities
faith_based = pd.read_csv('../../data_cleaned/faith-based_facilities.csv')
faith_based_count = faith_based['hood'].value_counts()
print("faith_based:")
print("  mean:", faith_based_count.mean())
print("  median:", faith_based_count.median())
print("  max:", faith_based_count.max())

public_art:
  mean: 3.5849056603773586
  median: 2.0
  max: 26
museums:
  mean: 1.25
  median: 1.0
  max: 3
libraries:
  mean: 2.1052631578947367
  median: 2.0
  max: 3
parks:
  mean: 7.5675675675675675
  median: 5.0
  max: 35
nonprofit_orgs:
  mean: 46.61797752808989
  median: 15.0
  max: 1535
faith_based:
  mean: 5.492537313432836
  median: 4.0
  max: 19


#### Inspiration Index Calculation

- According to the number, we find out a good scale of the combination of the amount of `public art`, `museums`, `libraries`, `parks`, `nonprofit organizations`, and `faith-based facilities` in the community, that is 3:1:2:5:15:4
- The score is calculated as follows:
  - `Tastescape Score = (Public Art * 20 + Museums * 60 + Libraries * 30 + Park * 12 + Nonprofit Organizations * 4 + Faith-based Facilities * 15)`
  - We all use log to avoid `Overliers`
  - We give a bias for each part's controbution to the inspiration, that is 3:3:1:1:2:1
  - We need to normalize the score to 0-100, so we need to find the max and min of the score in the dataset, and calculate the final score

In [15]:
# union all the index
all_hoods_index = public_art_count.index.union(
    museums_count.index
).union(
    libraries_count.index
).union(
    parks_count.index
).union(
    nonprofit_orgs_count.index
).union(
    faith_based_count.index
)
    

# create a new dataframe with the counts of each zip code
hood_counts = pd.DataFrame({
    'hood': all_hoods_index,
    'public_art_count': public_art_count.reindex(all_hoods_index, fill_value=0).values,
    'museums_count': museums_count.reindex(all_hoods_index, fill_value=0).values,
    'libraries_count': libraries_count.reindex(all_hoods_index, fill_value=0).values,
    'parks_count': parks_count.reindex(all_hoods_index, fill_value=0).values,
    'nonprofit_orgs_count': nonprofit_orgs_count.reindex(all_hoods_index, fill_value=0).values,
    'faith_based_count': faith_based_count.reindex(all_hoods_index, fill_value=0).values,
})
# print head 10 of the new dataframe
print(hood_counts.head(10))

                hood  public_art_count  museums_count  libraries_count  \
0   Allegheny Center                18              1                0   
1     Allegheny West                 1              0                0   
2          Allentown                 2              0                0   
3          Arlington                 0              0                0   
4         Banksville                 0              0                0   
5  Bedford Dwellings                 0              0                0   
6          Beechview                 5              0                2   
7        Beltzhoover                 3              0                0   
8         Bloomfield                 1              0                3   
9              Bluff                 0              0                0   

   parks_count  nonprofit_orgs_count  faith_based_count  
0           16                  1535                  4  
1            0                    74                  4  
2          

In [17]:
# calculate the score for each zip code
hood_counts['score'] = (np.log1p(hood_counts['public_art_count'] * 20) * 3 +
                        np.log1p(hood_counts['museums_count'] * 60) * 3 +
                        np.log1p(hood_counts['libraries_count'] * 30) * 1 +
                        np.log1p(hood_counts['parks_count'] * 12) * 1 +
                        np.log1p(hood_counts['nonprofit_orgs_count'].clip(upper=200) * 4) * 2 +
                        np.log1p(hood_counts['faith_based_count'] * 15) * 1)

hood_counts = hood_counts.sort_values(by='score', ascending=False)
# print head 10 of the new dataframe

print(hood_counts.head(10))
# normalize the score to be between 0 and 1
hood_counts['score'] = (hood_counts['score'] - hood_counts['score'].min()) / (hood_counts['score'].max() - hood_counts['score'].min())
# sort the dataframe by score in descending order
# save the new dataframe to a csv file
tastescape_scores = hood_counts[['hood', 'score']]
print(tastescape_scores.head(10))
tastescape_scores.to_csv('../../data_score/inspiration_index.csv', index=False)

                         hood  public_art_count  museums_count  \
15  Central Business District                26              1   
0            Allegheny Center                18              1   
54              North Oakland                 1              3   
27               East Liberty                 1              1   
61               Point Breeze                 3              1   
75        Squirrel Hill South                22              0   
70           South Side Flats                10              0   
55                North Shore                 7              1   
6                   Beechview                 5              0   
37                  Hazelwood                 4              0   

    libraries_count  parks_count  nonprofit_orgs_count  faith_based_count  \
15                2            4                   660                 11   
0                 0           16                  1535                  4   
54                2            1          