# Trading Places
In this homework, you will use the BACI bilateral trade flows dataset to analyze trade flows for the world and, more specifically, for the country of __Uruguay__. You will also do a rudimentary analysis of what is referred to as the "gravity equation" from international trade theory.

### **Formatting (2 points)**
- Label each of your questions with a header like below ("Question 1"), putting the code cells below each header.
    - See Homework 1 Solution for an example
- Link each header to a Table of Contents which you'll create at the top of your document, using these instructions:
    - https://www.geeksforgeeks.org/how-to-add-a-table-of-contents-in-the-jupyter-notebook/
    - See Homework 1 Solution for an example
- Within each header, make subheaders that link to specific outputs.
    - e.g. "Plot of 2017 and 2018 Rasters" under Question 1

### Question 1: Reading Data __(1 point)__
Read in every year of available data (2012 to 2020) using Dask 
### Question 2: Descriptive Statistics __(3 points)__
    
A. In the year 2012, who were the top 10 countries with the most trading partners? What about the bottom 10?
    
B. Using the designation [here]( https://www.foreign-trade.com/reference/hscode.htm), describe the trade volume of the whole dataset in terms of value and list the five highest value sectors in the entire sample period.

C. Calculate the top 10 goods (level of k) with the highest trade volume in the entire dataset in terms of:
    
  1. Value
    
  2. Quantity
  
 
### Question 3: Country statistics: __(2 points)__

A. Calculate the top 10 exports (in terms of value) of Uruguary in 2012.
    
B. Using the aggregated categories from 2B, find the product category with the highest annual average value from 2012 to 2020 in Uruguary and plot imports and export of this category from 2012 to 2020.



### Question 4: The Gravity Equation Relationship __(4 points)__

A. __(1 points)__ Using the country shapefile, calculate the distance in kilometers between the centroid of Uruguay and all of its trading partners using haversine distance or Euclidean distance (depending on what projection you are using).
    
B. __(1 point)__ Create a scatterplot of distance and trade flow volume (quantity and value) in logarithm form.
    
C. __(1 point)__ Calculate the correlations between trade volume and distance for both value and quantity. Interpret the correlations for each of these factors.
    
D. __(1 points)__ Using the designation you created in 2B, for which sectors are trade flows (quantity) most strongly related to distance for this country? In other words, calculate the correlation between trade flow and distance for each cateogry of product for Uruguay.
    
### Question 5. Estimation __(5 points)__. 

For all the next questions, use the Uruguay trade flows (both exports and imports).
    
A. __(1 point)__ Create a function called "SSE_1p" that does the following given a scalar input $\beta$:
1. subtracts the logarithm of quantity flow from the logarithm of distance times $\beta$: $ e = log(F_{ij}) - \beta log(d_{ij}) $
2. Squares the error.
3. Returns the sum of the squared error. 
    
B.__(1 point)__ Plot the function from -1 to +3 with steps of .001, and then find the minimum of the function with the numpy function "argmin." Is it what you expected?
    
C. __(1 point)__ Time the above function, which is a grid search done sequentially. Then, use `dask` or `multiprocess` to  parallelize the function above and time it. What are the differences? 
    
D. __EXTRA CREDIT (4 points)__ 
   1. Create another function called "SSE_2p" does the same as "SSE_1p" but instead takes a __vector__ as an argument: $ e = log(F_{ij}) - \beta[0] - \beta[1] log(d_{ij})$
 
   2. Do a grid search with the first parameter range as 1500 to 2500 in steps of 10, and the second parameter range as -1 to +3 in steps of .005 as before. Do it using either `dask` or `multiprocessing`. What were your results? Comment on how or why the parameter on $log(d_{ij})$ is different than what you found in part 4b.
                NOTE: if it hits the lower bound, this means you should try a different parameter range to maximize over.
     
E. __(1 point)__ Now use the "econtools" package "reg" function to estimate the equation from part a but instead with a constant term. How much did the distance coefficient change?
     
F. __(1 point)__ Finally, merge in the GDP numbers from the World Bank and do the regression again, except this time adding the log of each country's GDP as covariates. How did the coefficient on distance change?
     
### Question 5. __(3 points)__ 
An analyst asks you "for which commodity sectors is distance most important in terms of exporting?" Describe what your economic intuition would say about the answer, and then also map out an analysis strategy for answering their question. As usual, describe: 
    - What data you would need.
    - How you would manipulate the data.
    - What analysis you would run.

In [1]:
import glob
import pandas as pd
from econtools.metrics import reg
import numpy as np
import dask.dataframe as dd
from dask.distributed import Client, progress
import multiprocess as mp
import matplotlib.pyplot as plt
import json
import dask
import geopandas as gp
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt
from multiprocessing.pool import ThreadPool, Pool
from dask import delayed
from dask import compute

def haversine(row):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    lon1, lat1, lon2, lat2 = row
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r