# Assignment #1: Evaluate Data Set

In your first graded assignment for this course, you will find and evaluate a data set on a topic of your choosing. You will be given specific questions related to the data set, and you will also be tasked with importing the data from your data set into your Jupyter Notebook environment. 

## Choose a Broad Topic

First, choose a topic of interest to you, such as a specific social issue, a political or cultural trend, a community, or a hobby or personal interest. Ideally, it should be something relatively broad that society or culture is having some sort of conversation about, or which many people think about on a regular basis. Possible examples might be:

- Housing and rental prices in New York City
- A popular genre of music
- A sport
- Food prices
- Crime
- Government spending
- Popularity of movies or TV shows 
- Bestselling books

## Find a Data Set

Search for terms related to your topic on Google, both generally and by adding specific names of websites where data will commonly be found. The following websites are frequently places to find data sets, and it sometimes makes sense to both search directly on the site and to do regular Google searches with the site name:

- [Google Data Set Search](https://datasetsearch.research.google.com/)
- [GitHub](https://github.com/)
- [Kaggle](https://www.kaggle.com/)
- [Data.gov](https://www.data.gov/)

For example, if your topic is bestselling books, you might try these searches on Google:

- bestselling books data set
- bestselling books data set kaggle
- bestselling books data set github
- bestselling books data set 

You're not likely to find bestselling books data on data.gov, but you might find a lot of info related to health, the economy, and demographics, since that's what the government is most concerned about.

-----

What is your topic? (Enter responses to questions like these in the cell provided below.)

U.S. foreign assistance

What drew you to this topic, or why did you choose it?

I have previously done papers on South Yemen (1967-1991) and Libya and found some interesting anecdotes
on foreign assistance. I think it would be interesting to examine U.S. direct foreign assistance, plus it could be helpful in writing a paper later in my major.

## Choose a Data Set

Choose a data set to use for the rest of this assignment. The data set doesn't need to be perfect, but ideally it should interest you and be in one of the discussed formats (CSV, TSV, Excel, JSON, .txt). If the data is in another format, either reach out to me to ask about it or choose another data set for this assignment. You may want to look ahead to the rest of this assignment to make sure nothing about the data set will make it difficult to answer the questions or to import the data.

What is the name of your data set? Or provide a one-sentence description.

"U.S. Overseas Loans and Grants" or "The GreenBook"

What is the URL of the page on which you found the data set? (Paste the full URL.)

https://catalog.data.gov/dataset/u-s-overseas-loans-and-grants-greenbook

## Evaluation

In this step, you will evaluate the data to the extent possible without using data science tools like Python or Pandas. You should download the data to your computer, and use both the data set itself as downloaded and the page you found it on to answer these questions. If you can't answer a question, write what you tried to do to answer it—don't give up right away, and try to think of other ways to answer the question. (In some cases, you can even contact the person who created the data set—if you do so, feel free to copy me on the email.)

What is the file format or file extension of the data set? Examples might be .csv (Comma Seperated Values), .tsv (Tab Separated Values), .xlsx (Excel workbook), .txt (plain text file), or JSON (JavaScript Object Notation).

Excel workbook (.xlsx)

What is the size in megabytes (MB)or gigabytes (GB) of the data set?

4.3 megabytes

How many columns or fields is the data set? (Columns or fields are different types of data—for example, a book dataset might have title, author, and year as columns.)

9 columns

List out the columns in the data set. (You can put each column on a line, or you can separate the columns with commas.)

Fiscal Year, Region, Country, Assistance Category, Publication Row, Funding Agency, Funding Account Name, Obligations (Historical Dollars), Obligations (Constant Dollars)

How many rows is the data set? (To use the book data set example again, each row might represent one book.)

72,644 rows

What types of data appear in the data set? (You can use Python terms, like "integer," "float," "boolean," "string," or you can use other descriptive terms, like "numeric data" or "text data." Try to be as comprehensive as possible in your answer.

numeric (dollars), qualitative (category of assistance, country, funding agency), and chronological (year the $ was allocated) data.

On initial inspection, does anything appear to be missing or wrong in the data set? (Don't spend too long on this.)

Not that I can see

What kinds of questions could you answer with this data set? In answering this question, write at least one paragraph of at least 150 words.

Because of the role of the U.S. in global affairs, the amount of questions that this data set could answer are many. How much money does the U.S. provide in economic or military assistance? Which countries receive more military assistance? Does U.S. military/economic assistance trend more towards one country or region in the world? Has economic or military assistance increased over the years? Are there years in which military or economic assistance dipped or spiked? Has the way assistance is allocated shifted between departments? The data set can't provide reasons as to why changes have occured alone, but when examining the data, it can be easier to view historical trends and make inferences as to why spending changed. For example, in my paper about Libya, it would be fascinating to see how U.S. assistance changed through the political developments in the country (like the coup in 1969 for example, or the murder of Qaddafi in 2011). Did the U.S. previously provide military or economic aid that stopped, decreased, or increased after the 1969 revolution? Did the aid increase, start, stop, or decrease in 2011? This could be quite valuable for analysing the role of U.S. economic and military aid in the world or in a particular country. 

Do you see any issues or limitations with the data set? Alternatively, what do you wish was included in the data set that is not included? (Write at least one paragraph of at least 150 words.)

The first thing I noticed was that Yemen is combined into one country even during the years that it was split in two (1967-1991). This would make more insightful analysis on U.S. assistance to South Yemen during these years particularly difficult without more details (which are not included in the dataset). I assume this is true of more countries which either have had competing governments or were not consistent with the state they are in now. Additionally, some of the historical data is not complete. The publication row and funding agency columns are missing some of the data, instead filling it with "unknown" or "inactive programs". I wish they had done more research to determine the funding agencies as I'm almost certain there are documents that would reveal this. I also wish they had listed the inactive programs instead of just listing "inactive programs", because it might still be relevant to see what programs the funding was listed under to determine the role of those programs at the time. Additionally, it is missing the greater context of world history so the data can't be necessarily connected to broader trends, though this is a limitation of many/most datasets. 

## Reading the Data in Python

Using as many cells as you need in the rest of this notebook, load the data into Python. You will probably want to use Pandas to load the data. Some example code is provided for you below. 

To use the example code in a Jupyter Notebook on your computer, you will need to make sure your data set is in the same folder as your notebook, and that you get the filename *exactly* right, including the extension. Here is example code for a Jupyter Notebook—this is just to get you started, and you are responsible for getting this working, which may involve looking up how to import data using Python and Pandas on Google.

```python
import pandas

df = pandas.read_csv('name_of_data_file.csv')

df
```

If your data is in another format, you will need to use the Pandas function related to that format. For example, to import JSON:

```python
import pandas

df = pandas.read_json('name_of_data_file.json')

df
```

Use as much space as you need below to import the file into Pandas (as above). Make sure the dataframe (df) is shown as an output at the end.


In [6]:
import pandas as pd


In [10]:
df = pd.read_excel("/Users/lucaslarsen/Documents/CCNY Python Portfolio/foundations-of-data-science/assignments/greenbookUS.xlsx")

I believe the format of the excel spreadsheet is causing some problems with the ability of pandas to read the document. There is a "header" with 'U.S. Economic and Military Assistance Fiscal Years 1946-2019' as the first line, which has seemingly been picked up as the first header. 


In [9]:
df

Unnamed: 0,Fiscal Year,Region,Country,Assistance Category,Publication Row,Funding Agency,Funding Account Name,Obligations (Historical Dollars),Obligations (Constant Dollars)
0,1946.0,Middle East and North Africa,Egypt,Economic,Inactive Programs,Unknown - Historical Greenbook,INACTIVE - US Surplus Property,9.300000e+06,1.000840e+08
1,1946.0,Middle East and North Africa,Egypt,Economic,Inactive Programs,Unknown - Historical Greenbook,INACTIVE - UN Relif and Rehab Agency (UNRRA),3.000000e+05,3.228517e+06
2,1946.0,Middle East and North Africa,Iran,Economic,Inactive Programs,Unknown - Historical Greenbook,INACTIVE - US Surplus Property,3.300000e+06,3.551368e+07
3,1946.0,Middle East and North Africa,Lebanon,Economic,Inactive Programs,Unknown - Historical Greenbook,INACTIVE - US Surplus Property,1.600000e+06,1.721876e+07
4,1946.0,Middle East and North Africa,Saudi Arabia,Economic,Inactive Programs,Department of the Treasury,INACTIVE - Lend Lease Silver,2.400000e+06,2.582813e+07
...,...,...,...,...,...,...,...,...,...
72632,2019.0,World,World (not specified),Military,Foreign Military Financing,Department of State,Foreign Military Financing Program,1.976689e+06,1.976689e+06
72633,2019.0,World,World (not specified),Military,International Military Education and Training,Department of State,International Military Education and Training,3.635807e+06,3.635807e+06
72634,2019.0,World,World (not specified),Military,Peace Keeping Operations,Department of State,Peace Keeping Operations,1.147500e+08,1.147500e+08
72635,2019.0,World,World (not specified),Military,Other Military Assistance,Department of Defense,"Operation and Maintenance, Air Force",3.000000e+06,3.000000e+06
