# Alpacalypse Now
While the United States is home to millions of dairy cows, it is also home to a much furrier type of livestock from South America: the Alpaca.

Alpacas are wool producing camelids from the Andes which, due to a very odd tax loophole, became the subject of a very large asset bubble in US agriculture around 2008. In 1995, there were around 10,000 alpacas in the United States. By 2006, there were more than 85,000.

[When the Great Alpaca Bubble Burst - Priceonomics](https://priceonomics.com/when-the-great-alpaca-bubble-burst/)

In a 2007 paper titled ["Alpaca Lies? Speculative Bubbles in Agriculture: Why They Happen and How to Recognize Them"](https://onlinelibrary-wiley-com.proxy2.library.illinois.edu/doi/full/10.1111/j.1467-9353.2007.00343.x), Tina Saitone and Richard Sexton explained that the auction price of alpacas had at one point reached $75,000 per animal. Using data on rearing costs, they explain that the price of alpaca wool would have had to grow 20\% a year for several years in order to justify this price.  

As it turned out, the alpaca market was yet another example of a speculative commodity bubble. Between 2006 and 2011, the price of alpacas dropped $30,000 (see their follow up [article](https://s.giannini.ucop.edu/uploads/giannini_public/51/b2/51b2d799-65e5-47cf-9a1b-060c75a79501/v15n5_3.pdf))

Was this the end of the alpaca industry in the United States? What is the state of the US Alpaca industry today? Using data from the NASS API, you will characterize the Alpaca industry since 2007.

Using the NASS API, I have already downloaded four variables from the Agricultural Census from the years 2007, 2012, and 2017: number of alpacas, number of operations with alpacas, (inventory data) total revenue from alpaca sales, and total alpacas sold (sales data). The variables are all measured at the state level. 

I have merged in the sales data using the code below:

In [None]:
nass_key = "A9150F7B-9924-3966-9A15-F16AE9683F77" 

In [None]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def download_alpaca_data(short_desc):
    URL = "http://quickstats.nass.usda.gov/api/api_GET/"
    params = {"key":nass_key, # Put the API KEY
          "year__GE":"2007", # The year Census we want; the "__GE" means "greater or equal".
          "domain_desc":"TOTAL", # total across all domains
          "source_desc":"CENSUS",# Specify that we want the Census, not a survey.
          "agg_level_desc":"STATE", # Specify that we want the state level.
          "short_desc":short_desc # The name of the variable, so we don't 
                                                        # have to specify more params
         }
    r = requests.get(url = URL, params = params)
    if r.status_code ==200:
        return pd.DataFrame(r.json()['data'])
    else:
        print("404: Dump it, this one's garbage.")
        
## Downloading the Sales Data
alpc_sales_dollars = download_alpaca_data("ALPACAS - SALES, MEASURED IN $")
alpc_sales_head = download_alpaca_data("ALPACAS - SALES, MEASURED IN HEAD")

alpc_sales_dollars['sales_dollars'] = alpc_sales_dollars['Value']
alpc_sales_dollars.drop("Value",axis=1,inplace=True)

alpc_sales_head['sales_head'] = alpc_sales_head['Value']
alpc_sales_head.drop("Value",axis=1,inplace=True)

sales = alpc_sales_dollars.merge(alpc_sales_head,on=["state_name","year"])
sales = sales[['state_name','year','sales_head','sales_dollars']]

sales.to_csv("alpaca_sales.csv",index=False)

I have also downloaded the inventory data but have not merged it.

In [None]:
download_alpaca_data("ALPACAS - INVENTORY").to_csv("nass_data1.csv",index=False)
download_alpaca_data("ALPACAS - OPERATIONS WITH INVENTORY").to_csv("nass_data2.csv",index=False)

Now do the following tasks. __For all graphs, label the x-axis and the y-axis__:

1. Data cleaning (2 points, 1/2 points each)
    
    a. Read in the data "nass_data1.csv" and clean the "value" column to be an integer. Rename the column to "inventory."

    b. Read in the data "nass_data2.csv" and clean the "value" column to be an integer. Rename the column to "farms"
    
    c. Merge the data sources together into one dataframe using "state_name" and "year" as the keys.
       
    d. Finally, read in the "sales" data and merge them after cleaning the "sales_head" and "sales_dollars" columns to be numeric instead of string.


2. Data analysis (3 points, 1 point each)
    
    a. Make a bar graph showing the number of alpacas in the entire US in the years 2007, 2012, and 2017. 

    b. Using `twinx`, add another bar to the above graph which shows the number of farms (HINT: set `position = 1` for the second graph and `position = 0` for the first graph).
    
    c. Report the following summary statistics for each year __in one table__:
        - The average alpaca herd size.
        - The average alpaca sale price.
    
3. __EXTRA CREDIT (1 point)__: Which 5 states had the highest growth in their alpaca population between 2007 and 2017? Show it on a bar graph.