# Exploratory Data Analysis


First visual analysis of the industry project data curated in `data_prep.ipynb`.

Questions to explore:
1. What is the distribution of deal types? lender type? company type? operational status?
2. How are major owners distributed by country?
3. How many of these companies are operating/working on foreign soil? 
4. How many of these companies come from outside South America?
5. Are there any trends between project country and company country?
6. Is there a company, host country, or country of company HQ that is dominant over others? Any trends in new projects?
7. How many major owners operate in foreign soil?
8. Find the correlation between owner type and projects per country
9. Where are these projects distributed within each country?
10. How are these projects distributed within the Amazon River Basin?

Steps:
1. Visualize and review the spatial distribution of projects (continent, country, biome, and segment by industry)
2. Visualize and review the distribution of ownership (by country, company, and ownership style)
3. Compare the distribution of projects (by host country, company, sector) to their features (ownership style, sector, size)
4. Assess for patterns between project attributes and participating entities (nation, enterprise)

In [1]:
# Import libraries
import pandas as pd
import geopandas as gpd
import plotly.express as px
import folium as f
import warnings

In [2]:
warnings.filterwarnings('ignore')

In [3]:
# Load data
project_path = "../data/all_projects.csv"
coords_path = "../data/project_coords.csv"

df = pd.read_csv(project_path)
coords = pd.read_csv(coords_path)

# Set coordinate data as a geodataframe
coords = gpd.GeoDataFrame(coords,geometry=gpd.points_from_xy(coords.longitude, coords.latitude))

In [5]:
# Load country geo data

# Get a GeoDataFrame for all country data
world_path = gpd.datasets.get_path('naturalearth_lowres')
world = gpd.read_file(world_path)

# Filter out to only use data from South American nations
south_america = world[world.continent == "South America"]
south_america

Unnamed: 0,pop_est,continent,name,iso_a3,gdp_md_est,geometry
9,44938712.0,South America,Argentina,ARG,445445,"MULTIPOLYGON (((-68.63401 -52.63637, -68.25000..."
10,18952038.0,South America,Chile,CHL,282318,"MULTIPOLYGON (((-68.63401 -52.63637, -68.63335..."
20,3398.0,South America,Falkland Is.,FLK,282,"POLYGON ((-61.20000 -51.85000, -60.00000 -51.2..."
28,3461734.0,South America,Uruguay,URY,56045,"POLYGON ((-57.62513 -30.21629, -56.97603 -30.1..."
29,211049527.0,South America,Brazil,BRA,1839758,"POLYGON ((-53.37366 -33.76838, -53.65054 -33.2..."
30,11513100.0,South America,Bolivia,BOL,40895,"POLYGON ((-69.52968 -10.95173, -68.78616 -11.0..."
31,32510453.0,South America,Peru,PER,226848,"POLYGON ((-69.89364 -4.29819, -70.79477 -4.251..."
32,50339443.0,South America,Colombia,COL,323615,"POLYGON ((-66.87633 1.25336, -67.06505 1.13011..."
40,28515829.0,South America,Venezuela,VEN,482359,"POLYGON ((-60.73357 5.20028, -60.60118 4.91810..."
41,782766.0,South America,Guyana,GUY,5173,"POLYGON ((-56.53939 1.89952, -56.78270 1.86371..."


## 