# Trading Places
1. Read in data using Dask __(1 point)__
2. Descriptive Statistics: __(2 points)__
    
    a. In the year 2012, who were the top 10 countries with the most trading partners? What about the bottom 10?
    
    b. Using the designation [here]( https://www.foreign-trade.com/reference/hscode.htm), describe the trade volume of the whole dataset in terms of value and list the five highest value sectors in the entire sample period.

3. Country statistics: __(3 points)__

    a. Calculate the top 10 exports (in terms of value) of our chosen country in 2012.

    b. Calculate the top 10 goods with the highest trade volume for both exports and imports in terms of:
    
        i. Value
    
        ii. Quantity

3. The Gravity Equation Relationship __(4 points)__

    a. Using the country shapefile, calculate the distance in kilometers between the centroid of our chosen country and all the other centroids. You may use either haversine distance with the geographic CRS or use a projected CRS and use the `distance` function of `geopandas`.
    
    b. Using the country we chose in class, create a scatterplot of distance and export volume (quantity and value) in logarithm form.
    
    c. Calculate the correlations between exports and distance for both value and quantity. Interpret the correlations for each of these factors.
    
    d. Using the designation you created in 2.d., for which sectors are exports (quantity) most strongly related to distance for this country?
    
4. Estimation __(6 points)__
    
    a. Create a function called "SSE_1p" that does the following given a scalar input $\beta$:
        1. Subtracts the logarithm of quantity exports from the logarithm of distance times $\beta$.
$$ e = log(E_{ij}) - \beta log(d_{ij}) $$
        2. Squares the error.
        3. Returns the sum of the squared error. 
    b. Plot the function from -1 to +3 with steps of .005, and then find the minimum of the function with the numpy function "argmin." Is it what you expected?
    
    c. Create another function called "SSE_2p" does the same as "SSE_1p" but instead takes a __vector__ as an argument:
 $$ e = log(E_{ij}) - \beta[0] - \beta[1] log(d_{ij})$$
 
     d. Do a grid search with the first parameter range as 1500 to 2500 in steps of 10, and the second parameter range as -1 to +3 in steps of .005 as before. Do it using either `dask` or `multiprocessing`. Comment on how or why the parameter on $log(d_{ij})$ is different than what you found in part 4b.
     
     e. Now check your results using the "econtools" package "reg" function by running OLS with a constant term.
     
     f. Finally, merge in the GDP numbers from the World Bank and do the regression again, except this time the log of each country's GDP as covariates. How did the coefficient on distance change?
     
5. An analyst asks you "for which commodity sectors is distance most important in terms of exporting?" Describe what your economic intuition would say about the answer, and then also map out an analysis strategy for answering their question. As usual, describe: __(4 points)__
    - What data you would need.
    - How you would manipulate the data.
    - What analysis you would run.

In [1]:
import glob
import pandas as pd
import numpy as np
import dask.dataframe as dd
import matplotlib.pyplot as plt
import json
import dask
import geopandas as gp
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt


def haversine(row):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    lon1, lat1, lon2, lat2 = row
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r