### **Introduction**
To determine the "best neighboorhood", we're figuring out which neighborhood is the most "green". We used four different data sets, each of which corresponds to a factor that makes a city/neighborhood more environmentally friendly. 

### **The Metric**

**Note:** Each of these factors is essentially a count of something green in a neighborhood. Although we could calculate it by a simple count of each, our group feels that certain factors are more important than others, due to being more rare and/or more impactful on the environment. Therefore, some of them are weighted more than others.

##### **Dataset 1:** green-spaces-locations-pgh.csv 
- This is the list of each of the Green Spaces in Pittsburgh, as per the Operations Division of the Department of Public Works.
- The more green spaces a neighborhood has, the more green it is. 
- This is weighted by a factor of 2x.

##### **Dataset 2:** greenways-locations-pgh.csv
- This is the list of the of Greenways in Pittsburgh, given by Greenways.
- The more greenways a neighborhood has, the more green it is. 
- This is weighted by a factor of 3x.

##### **Dataset 3:** recyling-centers-locations-pgh.csv
- This is a list of locations where city residents are encouraged to drop off, dispose, or recycle of unwanted materials, given by Waste Recovery Locations.
- The more recyling centers a neighborhood has, the more green it is.
- This is weighted by a factor of 2x.

##### **Dataset 4:** smart-trash-locations-pgh.csv
- This is a list of locations of the City of Pittsburgh's Smart Trash Containers.
- The more smart trash containers a neighborhood has, the more green it is. 
- This is weighted by a factor of 1x.

### **The Best Neighborhood**

In [None]:
import pandas as pd

# Weight factors for each dataset
weight_factors = {
    'green_spaces': 2,
    'greenways': 3,
    'recycling_centers': 2,
    'smart_trash_containers': 1
}

# Read and process each dataset
datasets = {
    'green_spaces': pd.read_csv('green-spaces-locations-pgh.csv'),
    'greenways': pd.read_csv('greenways-locations-pgh.csv'),
    'recycling_centers': pd.read_csv('recyling-centers-locations-pgh.csv'),
    'smart_trash_containers': pd.read_csv('smart-trash-locations-pgh.csv')
}

# Calculate counts for each dataset
counts = {}
for dataset_name, dataset in datasets.items():
    if dataset_name == 'green_spaces':
        # Filter out rows where the Facility column is not empty
        green_spaces = dataset[dataset['Facility'].notnull()]
        # Count the number of green spaces in each neighborhood
        counts[dataset_name] = green_spaces['Neighborhood'].value_counts()
    elif dataset_name == 'greenways':
        # Count the number of greenways in each neighborhood
        counts[dataset_name] = dataset['nhood'].value_counts()
    elif dataset_name == 'recycling_centers':
        # Grouping by neighborhood and counting the number of entries for each neighborhood
        counts[dataset_name] = dataset.groupby('neighborhood').size()
    elif dataset_name == 'smart_trash_containers':
        # Grouping by neighborhood and counting the number of smart trash locations for each neighborhood
        counts[dataset_name] = dataset.groupby('neighborhood').size()

# Calculate weighted scores for each neighborhood
weighted_scores = {}
for neighborhood, count in counts['green_spaces'].items():
    weighted_score = 0
    for dataset_name, dataset_count in counts.items():
        weighted_score += count * weight_factors.get(dataset_name, 1) * dataset_count.get(neighborhood, 0)
    weighted_scores[neighborhood] = weighted_score

# Find the best neighborhood overall
best_neighborhood = max(weighted_scores, key=weighted_scores.get)
best_score = weighted_scores[best_neighborhood]

# Display the best neighborhood overall and its score
print("The best neighborhood overall based on the weighted factors is:", best_neighborhood)
print("Weighted score (overall):", best_score)

### **Conclusion**