# Analysis Challenge Assignment 1
### HUDK4050 Core methods in educational data mining

**Author**: Nicolás Dussaillant \
**Date**: 10/10/2021



## 1. Problem

Alex wants to choose a school to attend considering the following criteria:

1. Safety
2. Urban
3. Diversity
4. Quality





The data were obtained from the College Scorecard retrieved on 09-12-2015 and the Table of offences known to law enforcement of the FBI by state and city for 2019. Those two files were provided with the assignment, and for code and execution replication purposes it will be assumed that the person testing the code has them. In order to accomplish the task, two more files were needed, because it is not assumed that the person testing the code has those files, the code downloads them by itself (so an internet connection is needed). The reasons because those two extra files were needed are explained in the preparation of data.

## 2. Criteria

To define each criteria we will do the following:

1. **Safety**: 
To define safety for each school we will take the Offenses Known to Law Enforcement database from the FBI (available [here](https://ucr.fbi.gov/crime-in-the-u.s/2019/crime-in-the-u.s.-2019/tables/table-8/table-8.xls/view)) and add the fields of Violent crime plus Property crime and divide them by the population of each city. That way we will have an index that increases when safety is low and likewise decreases when safety is high. We will match that data with the reported city for each school campus.


2. **Urban**:
Alex wants a school in the big city, so we will take it as a must. That means that for the decision we will only consider schools located in big cities. We will take that data from the field called Locale of institution (locale) and select only the fields with values of "City: Large (population of 250,000 or more)" (tagged as 11).


3. **Diversity**:
To measure diversity we will take the fields called Total share of enrollment of undergraduate degree-seeking students who are: White, Black, Hispanic, Asian, American Indian/Alaska Native, Native Hawaiian/Pacific Islander, two or more races, non-resident aliens, and race unknown. These fields are called: 
    - UGDS_WHITE
    - UGDS_BLACK
    - UGDS_HISP
    - UGDS_ASIAN
    - UGDS_AIAN
    - UGDS_NHPI
    - UGDS_2MOR
    - UGDS_NRA
    - UGDS_UNKN

    Using those fields, we will calculate the normalised Herfindahl–Hirschman index for diversity (explained in [this article](https://www.insurance.ca.gov/diversity/41-ISDGBD/GBDExternal/upload/McKinseyDivmatters-201501.pdf)).
    
    In short words, for each school we will add the squares of all the group shares -that is the HHI- and then normalize it using this formula: $\text{NHHI} = \frac{\text{HHI}-\frac{1}{N}}{1-\frac{1}{N}}$ where N = 9 is the number of groups. In this case we will define a $\text{[Diversity index]} = 1 - \text{NHHI}$
    
    We will get a number that varies from 0 to 1, where 1 means that all groups have the same share of people, and 0 means that one group has 100\% of the people.


4. **Quality**:
To consider quality in our recommendation, we will take the following fields:
    - Admission rate (ADM_RATE): We consider that if a school admits fewer students compared to their applications it indicates that students prefer that school and it is more competitive to be admitted there.
    - Average faculty salary (AVGFACSAL): We consider that the average faculty salary could be related with the quality of the faculty, considering that faculties that earn more is because they're better.
    - Proportion of faculty that is full-time (PFTFAC): We consider the proportion of faculty that is full-time as a profit for quality, since that possibly means that faculty have more time to dedicate for students and the school.

To measure quality ($Q$) we will do the following arbitrary calculation:

$$Q=0.7\cdot\left(1-\text{ADM_RATE}\right)+0.1\cdot\frac{\text{AVGFACSAL} - \min\{{\text{AVGFACSAL\}}}}{\max\{{\text{AVGFACSAL}}\}-\min\{{\text{AVGFACSAL}\}}}+0.2\cdot\text{PFTFAC}$$

$Q$ is an index that varies from 0 to 1, being 0 the worst quality and 1 the best quality. It considers different arbitrary weights for each considered parameter and it standardizes the Average faculty salary to adjust it into a \[0,1\] scale.

Note: SAT and ACT results might be also good variables to consider, but as not all the schools have records for both variables, we will discard them this time.


5. **Other considerations**:
Additionally we will considere the following:
- As Alex is presented as she, we will remove all schools flagged as men-only (MENONLY).
- Alex wants a degree in education, so we will only consider schools that have a percentage of degrees awarded in Education (PCIP13) greater than 0.
- Finally, in an arbitrary decision to work with comparable data, we will remove all the schools that have missing data in the fields that will be considered to filter the data and make the decision.


## 3. Method

To do the recommendation we will follow these steps considering the criteria explained above:

0. Import the Python libraries that we will use for data handling, calculations, and algorithms (in this case `numpy`, `pandas`,`pymcdm`, `requests`).
1. Import data from both datafiles considering only the data fields needed to give relevant information to Alex and also required for the decision criteria.
2. Join both dataframes and filter all the rows that won't be considered.
3. Calculate fields that create an index: quality, safety and diversity.
4. Define weights for each criteria index (arbitrary define how important is each criteria for Alex, this can be adjusted).
4. Use the multiple-criteria decision-making method called Promethee II (this method uses criteria weights and normalized data, and it calculates distances between each alternative considering each criteria to rank all the schools, more details in **Brans, J. P., Vincke, P., & Mareschal, B. (1986). How to select and how to rank projects: The PROMETHEE method. European journal of operational research, 24(2), 228-238.**).
5. Deliver the top 15 ranked schools for this purpose.

## 4. Pre-procedure

First of all, the recommendation and ranking method used in this document uses an external Python package called `pymcdm` (multi-criteria decision-making) that is not available in conda's or conda-forge's repositories, but it is possible to install it in a conda environment. The package is available in the Python Package Index (PyPi) and the best way to install it is through `pip`. To do this, follow these steps:

1. Install `pip` with the following command: `conda install pip` (conda may have conflicts using `pip` because it is also a package manager, so to do it, it is suggested to install `pip` inside a speficic conda environment using the option `-n` in the previous command, but this is not completely necessary, you can use the base default environment).
2. Install `pymcdm` package using `pip install pymcdm`.

\* These steps were run on a UNIX-based OS (specifically MacOS, but it should work exactly the same on a Linux OS), to do it on Windows you might need to check [this](https://stackoverflow.com/a/43729857) (I'm not sure if it is absolutely necessary or the previous steps will work without any extra consideration).

## 5. Procedure

Import the packages that will be used:

In [1]:
import numpy as np
import pandas as pd
import pymcdm as mc
import requests

### 5.1 Data preparation

Create two functions that will help to process the imported data, casting each column to the appropriate data type

In [2]:
convnum = lambda x : int(x) if type(x) in [int, float] else np.NaN
convstr = lambda x : int(float(x)) if x != '' else np.NaN

**College data**: Import College Scorecard data considering the columns that we will use and their data types

In [3]:
# Read the csv
college_data = pd.read_csv('CollegeScorecard.csv',
                            sep = ',',
                            usecols = ['INSTNM', 'CITY', 'STABBR', 'INSTURL', 'LOCALE', 'LATITUDE', 'LONGITUDE', 
                                        'UGDS_BLACK', 'UGDS_HISP', 'UGDS_ASIAN', 'UGDS_AIAN', 'UGDS_NHPI', 'UGDS_2MOR', 'UGDS_NRA', 'UGDS_UNKN', 'UGDS_WHITE',
                                        'ADM_RATE', 'AVGFACSAL', 'PFTFAC', 'MENONLY', 'PCIP13'],
                            dtype = {
                                    'INSTNM' : str,
                                    'CITY' : str,
                                    'STABBR' : str,
                                    'INSTURL' : str,
                                    'LATITUDE' : float,
                                    'LONGITUDE' : float,
                                    'UGDS_WHITE' : float,
                                    'UGDS_BLACK' : float,
                                    'UGDS_HISP' : float, 
                                    'UGDS_ASIAN' : float, 
                                    'UGDS_AIAN' : float, 
                                    'UGDS_NHPI' : float, 
                                    'UGDS_2MOR' : float, 
                                    'UGDS_NRA' : float, 
                                    'UGDS_UNKN' : float,
                                    'ADM_RATE' : float,
                                    'PFTFAC' : float,
                                    'PCIP13' : float
                                },
                            converters = {
                                    'LOCALE' : convstr,
                                    'MENONLY' : convstr, 
                                    'AVGFACSAL' : convstr
                                }
                            )
display(college_data)


Unnamed: 0,INSTNM,CITY,STABBR,INSTURL,LOCALE,LATITUDE,LONGITUDE,MENONLY,ADM_RATE,PCIP13,...,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,AVGFACSAL,PFTFAC
0,Alabama A & M University,Normal,AL,www.aamu.edu/,12.0,34.7834,-86.5685,0.0,0.8989,0.1490,...,0.9501,0.0089,0.0022,0.0012,0.0010,0.0000,0.0002,0.0084,7079.0,0.8856
1,University of Alabama at Birmingham,Birmingham,AL,www.uab.edu,12.0,33.5022,-86.8092,0.0,0.8673,0.0862,...,0.2590,0.0258,0.0518,0.0026,0.0007,0.0344,0.0140,0.0130,10170.0,0.9106
2,Amridge University,Montgomery,AL,www.amridgeuniversity.edu,12.0,32.3626,-86.1740,0.0,,0.0000,...,0.4224,0.0093,0.0031,0.0031,0.0031,0.0000,0.0000,0.2671,3849.0,0.6721
3,University of Alabama in Huntsville,Huntsville,AL,www.uah.edu,12.0,34.7228,-86.6384,0.0,0.8062,0.0173,...,0.1310,0.0338,0.0364,0.0145,0.0002,0.0161,0.0329,0.0338,9341.0,0.6555
4,Alabama State University,Montgomery,AL,www.alasu.edu/email/index.aspx,12.0,32.3643,-86.2957,0.0,0.5125,0.2150,...,0.9285,0.0114,0.0015,0.0009,0.0007,0.0064,0.0207,0.0138,6557.0,0.6641
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7799,Georgia Military College-Columbus Campus,Columbus,GA,http://columbus.gmc.cc.ga.us/,,,,,,,...,,,,,,,,,,
7800,Georgia Military College-Valdosta Campus,Valdosta,GA,http://valdosta.gmc.cc.ga.us/,,,,,,,...,,,,,,,,,,
7801,Georgia Military College-Warner Robins Campus,Warner Robins,GA,http://robins.gmc.cc.ga.us/,,,,,,,...,,,,,,,,,,
7802,Georgia Military College-Online,Milledgeville,GA,http://online.gmc.cc.ga.us/,,,,,,,...,,,,,,,,,,


Filter schools removing the ones that aren't in big cities, don't have educational degrees or are for men only. We also remove all the rows that have missing data. Finally with change some City names because they are actually part of other cities, so that way we can merge with crime data.

In [4]:
# Filtering for initial criteria
no_big_city = college_data['LOCALE'] != 11
no_education = ~(college_data['PCIP13'] > 0)
menonly = college_data['MENONLY'] != 0

mask = no_big_city | no_education | menonly

# Filtering rows with missing data (missing data is computed as NaN)
for c in college_data.columns:
    mask = mask | pd.isna(college_data[c])

# Remove Schools from Puerto Rico
mask = mask | (college_data['STABBR'] == 'PR')

college_data.drop(college_data[mask].index, inplace=True)

# Adjusting some city names
college_data.update(college_data.loc[college_data['STABBR'] == 'NY'].replace(to_replace = ['Brooklyn',
                                                                    'Brooklyn Heights',
                                                                    'Flushing', 
                                                                    'Queens', 
                                                                    'Bronx', 
                                                                    'Staten Island', 
                                                                    'Jamaica',
                                                                    'Riverdale'], value = 'New York'))
college_data.update(college_data.loc[college_data['STABBR'] == 'CA'].replace(to_replace = ['Northridge', 'La Jolla'], 
                                                                                value = ['Los Angeles', 'San Diego']))

display(college_data)

Unnamed: 0,INSTNM,CITY,STABBR,INSTURL,LOCALE,LATITUDE,LONGITUDE,MENONLY,ADM_RATE,PCIP13,...,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,AVGFACSAL,PFTFAC
64,Alaska Pacific University,Anchorage,AK,www.alaskapacific.edu,11.0,61.1912,-149.8042,0.0,0.3745,0.0625,...,0.0294,0.0353,0.0176,0.1706,0.0059,0.0971,0.0000,0.1000,5137.0,1.0000
85,University of Arizona,Tucson,AZ,www.arizona.edu,11.0,32.2321,-110.9508,0.0,0.7692,0.0502,...,0.0333,0.2444,0.0561,0.0106,0.0021,0.0417,0.0550,0.0104,10320.0,0.7478
100,Grand Canyon University,Phoenix,AZ,www.gcu.edu,11.0,33.5122,-112.1299,0.0,0.5480,0.1437,...,0.2075,0.1452,0.0437,0.0090,0.0046,0.0252,0.0001,0.1220,5473.0,0.1175
127,Arizona Christian University,Phoenix,AZ,www.arizonachristian.edu,11.0,33.5951,-112.0262,0.0,0.6076,0.2857,...,0.0509,0.1426,0.0143,0.0081,0.0020,0.0346,0.0244,0.1324,4617.0,0.2051
230,California Baptist University,Riverside,CA,www.calbaptist.edu,11.0,33.9286,-117.4259,0.0,0.7880,0.0452,...,0.0840,0.3012,0.0497,0.0071,0.0098,0.0285,0.0191,0.0526,8355.0,0.4779
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4177,Marquette University,Milwaukee,WI,www.marquette.edu,11.0,43.0388,-87.9281,0.0,0.5745,0.0526,...,0.0454,0.0933,0.0463,0.0018,0.0011,0.0330,0.0348,0.0026,9665.0,0.5544
4185,Mount Mary University,Milwaukee,WI,www.mtmary.edu/,11.0,43.0723,-88.0306,0.0,0.5397,0.0321,...,0.2207,0.1443,0.0666,0.0049,0.0012,0.0259,0.0049,0.0086,5671.0,0.8642
4211,Wisconsin Lutheran College,Milwaukee,WI,wlc.edu,11.0,43.0369,-88.0227,0.0,0.6356,0.0659,...,0.0558,0.0463,0.0085,0.0066,0.0000,0.0208,0.0255,0.0113,6106.0,0.9605
4218,University of Wisconsin-Milwaukee,Milwaukee,WI,www.uwm.edu,11.0,43.0768,-87.8805,0.0,0.8608,0.0745,...,0.0822,0.0759,0.0596,0.0043,0.0008,0.0314,0.0271,0.0017,8037.0,0.9599


**Crime data**: Import crime data for each city in 2019, clean values, and remove rows with missing data.

In [5]:
# Read excel
crime_2019 = pd.read_excel('Table_8_Offenses_Known_to_Law_Enforcement_by_State_by_City_2019.xls',
                            header = 3,
                            skipfooter = 8,
                            usecols = ['State', 'City', 'Population', 'Violent\ncrime', 'Property\ncrime'],
                            sheet_name = '19tbl08',
                            converters = {
                                'Population' : convnum,
                                'Violent\ncrime' : convnum,
                                'Property\ncrime' : convnum
                                }
                            )
# Fill combined excel cells and remove numbers from city and state names
crime_2019['State'] = pd.Series(crime_2019['State']).fillna(method='ffill')
crime_2019['State'] = crime_2019['State'].str.replace(r'\d+', '', regex=True)
crime_2019['City'] = crime_2019['City'].str.replace(r'\d+', '', regex=True)

# Remove rows with missing data
mask = pd.isna(crime_2019['Population']) | pd.isna(crime_2019['Violent\ncrime']) | pd.isna(crime_2019['Property\ncrime'])

crime_2019.drop(crime_2019[mask].index, inplace=True)

The 2019 database doesn't have all the cities (specially from Alabama), so we will use data from 2018 to try to complement the 2019 database (as the 2018 database wasn't in the assignment files, the following code downloads it). We do the same data cleaning as with the 2019 database.

In [6]:
# Get the excel file for 2018
url = 'https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/tables/table-8/table-8.xls'
req = requests.get(url)

# Read file
crime_2018 = pd.read_excel(req.content,
                            header = 3,
                            skipfooter = 8,
                            usecols = ['State', 'City', 'Population', 'Violent\ncrime', 'Property\ncrime'],
                            sheet_name = '18tbl08',
                            converters = {
                                'Population' : convnum,
                                'Violent\ncrime' : convnum,
                                'Property\ncrime' : convnum
                                }
                            )

# Same procedure than with the 2019
crime_2018['State'] = pd.Series(crime_2018['State']).fillna(method='ffill')
crime_2018['State'] = crime_2018['State'].str.replace(r'\d+', '', regex=True)
crime_2018['City'] = crime_2018['City'].str.replace(r'\d+', '', regex=True)

mask = pd.isna(crime_2018['Population']) | pd.isna(crime_2018['Violent\ncrime']) | pd.isna(crime_2018['Property\ncrime'])
crime_2018.drop(crime_2018[mask].index, inplace=True)

crime_remove_index = crime_2018[(crime_2018['State'] + crime_2018['City']).isin(crime_2019['State'] + crime_2019['City'])].index
crime_2018.drop(crime_remove_index, inplace=True)

Now we add both dataframes, we're adding to the 2019 only the cities that are in 2018 and not in 2019. And we also change some cities' name to match the College Scorecard.

In [7]:
# Remove cities from 2018 table that are already in the 2019 table
crime_remove_index = crime_2018[(crime_2018['State'] + crime_2018['City']).isin(crime_2019['State'] + crime_2019['City'])].index
crime_2018.drop(crime_remove_index, inplace=True)

# Add the remaining 2018 rows to the 2019
crime_data = crime_2019.append(crime_2018, ignore_index=True)

# Replace some city names
crime_data.replace(to_replace = ['Metropolitan Nashville Police Department', 
                                    'Charlotte-Mecklenburg', 
                                    'Louisville Metro',
                                    'St. Paul',
                                    'St. Louis'],
                                    value = ['Nashville', 'Charlotte', 'Louisville', 'Saint Paul', 'Saint Louis'], inplace = True)


The College Scorecard uses State codes but the crimes databases use state name. To handle that, we download a file with the association State name and code, and merge the `crime_data` DataFrame with that file. After that, we have our complete Crime data.

In [8]:
# Get state codes
states = pd.read_json('https://gist.githubusercontent.com/mshafrir/2646763/raw/8b0dbb93521f5d6889502305335104218454c2bf/states_titlecase.json')

# Merge the crime dataframe with the corresponding state code
crime = crime_data.merge(   right = states,
                            how = 'left',
                            left_on = crime_data['State'].str.lower(),
                            right_on = states['name'].str.lower()
                        )

display(crime)

Unnamed: 0,key_0,State,City,Population,Violent\ncrime,Property\ncrime,name,abbreviation
0,alabama,ALABAMA,Hoover,85670.0,114.0,1922.0,Alabama,AL
1,alaska,ALASKA,Anchorage,287731.0,3581.0,12261.0,Alaska,AK
2,alaska,ALASKA,Bethel,6544.0,130.0,132.0,Alaska,AK
3,alaska,ALASKA,Bristol Bay Borough,852.0,2.0,20.0,Alaska,AK
4,alaska,ALASKA,Cordova,2150.0,0.0,7.0,Alaska,AK
...,...,...,...,...,...,...,...,...
9822,wisconsin,WISCONSIN,Monticello,1204.0,0.0,1.0,Wisconsin,WI
9823,wisconsin,WISCONSIN,New Lisbon,2500.0,8.0,8.0,Wisconsin,WI
9824,wisconsin,WISCONSIN,Shawano,8936.0,34.0,300.0,Wisconsin,WI
9825,wisconsin,WISCONSIN,Wilton,498.0,0.0,2.0,Wisconsin,WI


**Merge data**: Now we merge crimes with college information to have our complete data

In [9]:
final_data = college_data.merge(  right = crime,
                                    how = 'left',
                                    left_on = ['STABBR', 'CITY'],
                                    right_on = ['abbreviation', 'City']
                                )

**Calculate decision fields**: Here with calculate the Quality index, Diversity index, and Crime index as they were explained at the beginning of this document.

In [10]:
# Calculate the quality index using the formula from the method section
final_data['Quality index'] = 0.7 * (1 - final_data['ADM_RATE']) + \
                        0.1 * (final_data['AVGFACSAL'] - min(final_data['AVGFACSAL'])) / (max(final_data['AVGFACSAL']) - min(final_data['AVGFACSAL'])) + \
                        0.2 * final_data['PFTFAC']

# Calculate the diversity index using the formula from the method section
final_data['Diversity index'] = 0
groups = ['UGDS_BLACK', 'UGDS_HISP', 'UGDS_ASIAN', 'UGDS_AIAN', 'UGDS_NHPI', 'UGDS_2MOR', 'UGDS_NRA', 'UGDS_UNKN', 'UGDS_WHITE']
for g in groups:
    final_data['Diversity index'] += final_data[g] ** 2

N = len(groups)
final_data['Diversity index'] = 1 - (final_data['Diversity index'] - 1/N) / (1 - 1/N)

# Calculate the crime index using the formula from the method section
final_data['Crime index'] = (final_data['Violent\ncrime'] + final_data['Property\ncrime']) / final_data['Population']


**Data selection**: Finally we select only the data that we are going to use to calculate the recommendation, and use appropriate column names.

In [11]:
dt = pd.DataFrame(data = final_data,
                    columns = ['INSTNM', 'INSTURL', 'name', 'STABBR', 'City', 'Population', 'LONGITUDE', 'LATITUDE', 'Crime index', 'Diversity index', 'Quality index'])

dt.columns = ['University', 'Webpage', 'State', 'State code', 'City', 'City population', 'Long', 'Lat', 'Crime index', 'Diversity index', 'Quality index']

# Change population data type to int
dt['City population'] = dt['City population'].astype(np.int64)

display(dt)

Unnamed: 0,University,Webpage,State,State code,City,City population,Long,Lat,Crime index,Diversity index,Quality index
0,Alaska Pacific University,www.alaskapacific.edu,Alaska,AK,Anchorage,287731,-149.8042,61.1912,0.055058,0.734588,0.661912
1,University of Arizona,www.arizona.edu,Arizona,AZ,Tucson,548374,-110.9508,32.2321,0.039604,0.711530,0.382619
2,Grand Canyon University,www.gcu.edu,Arizona,AZ,Phoenix,1688722,-112.1299,33.5122,0.040135,0.812640,0.367037
3,Arizona Christian University,www.arizonachristian.edu,Arizona,AZ,Phoenix,1688722,-112.0262,33.5951,0.040135,0.684753,0.335003
4,California Baptist University,www.calbaptist.edu,California,CA,Riverside,333260,-117.4259,33.9286,0.034436,0.781828,0.297495
...,...,...,...,...,...,...,...,...,...,...,...
204,Marquette University,www.marquette.edu,Wisconsin,WI,Milwaukee,590923,-87.9281,43.0388,0.038873,0.488826,0.474234
205,Mount Mary University,www.mtmary.edu/,Wisconsin,WI,Milwaukee,590923,-88.0306,43.0723,0.038873,0.733409,0.523999
206,Wisconsin Lutheran College,wlc.edu,Wisconsin,WI,Milwaukee,590923,-88.0227,43.0369,0.038873,0.351705,0.480111
207,University of Wisconsin-Milwaukee,www.uwm.edu,Wisconsin,WI,Milwaukee,590923,-87.8805,43.0768,0.038873,0.526611,0.340024


### 5.2 Data processing

Here we run the recommendation method using Crime index, Quality index and Diversity index.
For this method we need to weigh every variable that will be considered with the importance that the person that is taking the decision gives to each one, the weights can be any positive number. 

For this case we use Crime=2, Diversity=4, and Quality=4.

In [12]:
# Define the weights
weights = np.array([2, 4, 4])

# Define the type of variable: -1 is cost and 1 is profit. In this case Crime is cost (larger crime index is worse) and the others are profits.
types = np.array([-1, 1, 1])

# Prepare data
data_decision = dt[['Crime index', 'Diversity index', 'Quality index']].to_numpy(dtype = np.float64)

# Get the function that will implement the model. We decided to use VShape criteria (details can be seen in the reference for Promethee II method)
prometheeII = mc.methods.PROMETHEE_II('vshape')

# Normalize data to adjust every variable into a [0,1] interval (minmax method is used)
norm_data =mc.normalizations.normalize_matrix(matrix = data_decision, method = mc.normalizations.minmax_normalization, criteria_types = None)

# Run the method (p values of 1 where select to discriminate with a slope of 1 in the VShape criteria)
pref = prometheeII(matrix = norm_data, weights = weights, types = types, p = [1, 1, 1])

# Calculate ranking
ranking = mc.helpers.rrankdata(pref)

# Add ranking to the DataFrame
dt.insert(0, 'Ranking', ranking.astype(np.int64))


### 5.3 Results

Now we present the results, delivering the top 15 universities according with all the data selection and processing that we did.

In [13]:
# Sort DataFrame considering the calculated ranking
dt.sort_values('Ranking', inplace = True)

# Select the top 15
top_colleges = dt.head(15)
top_colleges.set_index('Ranking', inplace=True)

display(top_colleges)


Unnamed: 0_level_0,University,Webpage,State,State code,City,City population,Long,Lat,Crime index,Diversity index,Quality index
Ranking,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,New York University,www.nyu.edu,New York,NY,New York,8379043,-73.9973,40.7295,0.020303,0.871489,0.69855
2,University of California-San Diego,www.ucsd.edu,California,CA,San Diego,1441737,-117.2378,32.8766,0.022442,0.834415,0.701561
3,New Jersey City University,www.njcu.edu,New Jersey,NJ,Jersey City,266508,-74.0873,40.7099,0.024266,0.854879,0.667272
4,San Diego State University,www.sdsu.edu,California,CA,San Diego,1441737,-117.0712,32.7753,0.022442,0.839501,0.651124
5,CUNY Hunter College,www.hunter.cuny.edu,New York,NY,New York,8379043,-73.9648,40.7687,0.020303,0.859744,0.615044
6,CUNY City College,www.ccny.cuny.edu,New York,NY,New York,8379043,-73.9506,40.8198,0.020303,0.869813,0.594498
7,CUNY Brooklyn College,www.brooklyn.cuny.edu,New York,NY,New York,8379043,-73.9531,40.6319,0.020303,0.837601,0.612275
8,Texas Wesleyan University,www.txwes.edu,Texas,TX,Fort Worth,915237,-97.2796,32.7333,0.031328,0.873981,0.64707
9,Nevada State College,nsc.edu,Nevada,NV,Henderson,317732,-114.9389,35.9872,0.019189,0.825822,0.59374
10,CUNY Queens College,www.qc.cuny.edu,New York,NY,New York,8379043,-73.8154,40.7378,0.020303,0.827554,0.568584


---
### 5.4 Alternative results presentation
To change the weights and the amount of universities recommended, and to display a map, go to this website: [https://alexs-decision.herokuapp.com/](https://alexs-decision.herokuapp.com/)

Note: This part was just an extra personal challenge to learn how to build a flask app in heroku, maybe just for this time. 

---