# Extraterrestrial Diplomatic Service Project

#### The Problem Statement

It's 2733, and you are a data scientist for the Extraterrestrial Diplomatic Service. The Service is regularly tasked with providing guidance to the Galactic Council on potential trade and business relations with extraterrestrial civilizations. This analysis helps the Council in understanding the potential for fruitful collaborations.

At the annual meeting of the Extraterrestrial Diplomatic Service, presenters highlighted the success of several joint space missions conducted in collaboration with extraterrestrial civilizations. They suggested that these past achievements could indicate potential for expanding partnerships into areas such as trade and business. They wondered what other characteristics of extraterrestrial civilizations could serve as predictors of future successful partnerships.

**Your job** is to do EDA with the dataset to begin this analysis.
**The goal** is to create a report that:
1. Recommends variables that could serve as predictors of future successful partnerships to the council.
2. Backs up your suggestions with numerical data and graphs.

- Your dataset, `extraterrestrial_civilizations.csv` has a randomly selected set of 50 civilizations' information for the following variables:

`Name_of_civilzation`: The civilization's name

`Years_since_first_contact`: Number of years since humanity first made contact with this civilization. (0-300)

`Technological_progress`: A measure of the civilization's overall technological progress on a scale from 1 to 100.

`Diplomatic_relations_index`: A measure of diplomatic relations between Earth and the civilization on a scale from 1 to 10, with higher values indicating more positive relations.

`Cultural_exchange_index`: A measure of the degree of cultural exchange between Earth and the  civilization on a scale from 1 to 10, with higher values indicating more exchange.

`Joint_space_missions`: The number of joint space missions between Earth and the civilization.

`Hostility_to_Earth_Index`: A measure of the civilization's hostility to Earth on a scale from 1 to 10, with higher values indicating more hostility.

`Degree_of_positive_contact`: A continuous variable measuring the degree of positive  contact with Earth on a scale from 1 to 100, with higher values indicating more positive contact.



In [21]:
import pandas as pd
import matplotlib.pyplot as plt

# read the CSV file into a pandas DataFrame 
et_data = pd.read_csv("../datasets/Extraterrestrial_civilizations.csv")

# display the first five rows
et_data.head()

Unnamed: 0,Name_of_civilzation,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Cultural_exchange_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
0,Venusians,50.0,20.0,7.0,5.0,1.0,3.0,55.0
1,Gargeleblobs,120.0,85.0,8.0,,12.0,2.0,88.0
2,Vogons,75.0,50.0,6.0,,4.0,4.0,60.0
3,Betelgeusians,200.0,90.0,9.0,,15.0,1.0,95.0
4,Pluvarians,150.0,70.0,7.0,,10.0,3.0,75.0


## Part I: Missing Data

#### Exercise 1 

1. Identify which variables have problems with missing data.
2. Calculate the percentage of missing data for those variables

In [None]:
#1 Identify which variables have problems with missing data.

In [None]:
#2 Calculate the percentage of missing data for those variables
# Create a DataFrame of boolean values with True for missing values and False if there is not a missing value at that position:

#Calculate the percentage of missing data for each column

#Display missing_percentages

### Exercise 2

- Decide what to do about the missing data in `Cultural_exchange_index`.

Review the cases in Part-2-Data-Cleaning-Techniques.ipynb and answer these questions:
1. Is it missing more than 50% of the data?
2. Is `Cultural_exchange_index` important to your data analysis? Explain.
3. Do you plan to delete the variable or emove the rows which contain missing values?

In [8]:
# Take the action you chose in 3.



### Part II: Outliers

#### Exercise 1

Preview the outliers in `Hostility_to_Earth_Index` with a boxplot. 

#### Exercise 2

Manage the outliers in `Hostility_to_Earth_Index`. 
1. Identify them using IQR.
2. What will you do with them & why? (Answer here)
3. Implement your plan. 

In [None]:
#1. Identify outliers using IQR.

#Calculate the IQR 

#Define the boundaries for outliers

#Identify outliers

#Display the outliers

In [None]:
#3. Implement your plan. 

### Stop & Smell the Roses
- What questions/conjectures do you have at this point about variables that could serve as predictors of future successful partnerships?

-
-
-


### Experiment with graphs to check out your ideas
- See Simple_Plotting_Guide-pandas.ipynb