# Extraterrestrial Diplomatic Service Project

#### The Problem Statement

It's 2733, and you are a data scientist for the Extraterrestrial Diplomatic Service. The Service is regularly tasked with providing guidance to the Galactic Council on potential trade and business relations with extraterrestrial civilizations. This analysis helps the Council in understanding the potential for fruitful collaborations.

At the annual meeting of the Extraterrestrial Diplomatic Service, presenters highlighted the success of several joint space missions conducted in collaboration with extraterrestrial civilizations. They suggested that these past achievements could indicate potential for expanding partnerships into areas such as trade and business. They wondered what other characteristics of extraterrestrial civilizations could serve as predictors of future successful partnerships.

**Your job** is to do EDA with the dataset to begin this analysis.
**The goal** is to create a report that:
1. Recommends variables that could serve as predictors of future successful partnerships to the council.
2. Backs up your suggestions with numerical data and graphs.

- Your dataset, `extraterrestrial_civilizations.csv` has a randomly selected set of 50 civilizations' information for the following variables:

`Name_of_civilzation`: The civilization's name

`Years_since_first_contact`: Number of years since humanity first made contact with this civilization. (0-300)

`Technological_progress`: A measure of the civilization's overall technological progress on a scale from 1 to 100.

`Diplomatic_relations_index`: A measure of diplomatic relations between Earth and the civilization on a scale from 1 to 10, with higher values indicating more positive relations.

`Cultural_exchange_index`: A measure of the degree of cultural exchange between Earth and the  civilization on a scale from 1 to 10, with higher values indicating more exchange.

`Joint_space_missions`: The number of joint space missions between Earth and the civilization.

`Hostility_to_Earth_Index`: A measure of the civilization's hostility to Earth on a scale from 1 to 10, with higher values indicating more hostility.

`Degree_of_positive_contact`: A continuous variable measuring the degree of positive  contact with Earth on a scale from 1 to 100, with higher values indicating more positive contact.



### Part I: Getting to know your dataset

### Exercises:

#### Exercise 1: Import the data

Import  pandas & then read the extraterrestrial_civilizations.csv file into a DataFrame called `et_data`.

In [1]:
# Import the pandas library.
import pandas as pd
# read the CSV file into a DataFrame
et_data = pd.read_csv("../datasets/Extraterrestrial_civilizations.csv")

#### Exercise 2 - Preview the data

View the first ten rows of the DataFrame `et_data`.

In [2]:
# View the first ten rows
et_data.head(10)

Unnamed: 0,Name_of_civilzation,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Cultural_exchange_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
0,Venusians,50.0,20.0,7.0,5.0,1.0,3.0,55.0
1,Gargeleblobs,120.0,85.0,8.0,,12.0,2.0,88.0
2,Vogons,75.0,50.0,6.0,,4.0,4.0,60.0
3,Betelgeusians,200.0,90.0,9.0,,15.0,1.0,95.0
4,Pluvarians,150.0,70.0,7.0,,10.0,3.0,75.0
5,Xytrons,10.0,30.0,4.0,,0.0,6.0,40.0
6,Zarblatts,80.0,60.0,14.0,,8.0,2.0,70.0
7,Kritons,175.0,95.0,9.0,,20.0,1.0,98.0
8,Qooglians,20.0,25.0,3.0,,1.0,5.0,30.0
9,Thumbers,100.0,80.0,5.0,,7.0,3.0,83.0


#### Exercise 3 - Check the dimensions

Check the dimensions of the `et_data` DataFrame.

In [3]:
# check the dimensions of your DataFrame
et_data.shape

(51, 8)

#### Exercise 4 Summarize the DataFrame

Show a quick summary of the `et_data` DataFrame.

In [20]:
# quick summary
et_data.describe()
# test_df = et_data.describe()
# test_df.loc['25%','Years_since_first_contact']

47.5

### Stop & Smell the Roses

1. What preliminary hypotheses/conjectures do you have at this point? # Null Hypothesis: We will have no enemies in our galaxy. Reject Null: We have an enemy in our galaxy
2. Have you thought of any ways you'll want to transform your data? (e.g, New variables you want to create? Columns you want to drop/add, ...) # 
3. What problems have you noticed with the data that you'll want to clean up when we get to data cleaning? # Hostility to earth outliers, cultural exchange na values


### Part II: Manipulating your DataFrame

#### Exercise 1: Subsetting by columns

Create a subset of the `et_data` DataFrame with the variables (columns) that you think might be most important for your analysis.

In [6]:
#Student Code Here
et_data_subset = et_data[['Name_of_civilzation', 'Years_since_first_contact','Technological_progress','Diplomatic_relations_index', 'Joint_space_missions', 'Hostility_to_Earth_Index', 'Degree_of_positive_contact']]

# Create a subset
et_data_subset.head(10)

Unnamed: 0,Name_of_civilzation,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
0,Venusians,50.0,20.0,7.0,1.0,3.0,55.0
1,Gargeleblobs,120.0,85.0,8.0,12.0,2.0,88.0
2,Vogons,75.0,50.0,6.0,4.0,4.0,60.0
3,Betelgeusians,200.0,90.0,9.0,15.0,1.0,95.0
4,Pluvarians,150.0,70.0,7.0,10.0,3.0,75.0
5,Xytrons,10.0,30.0,4.0,0.0,6.0,40.0
6,Zarblatts,80.0,60.0,14.0,8.0,2.0,70.0
7,Kritons,175.0,95.0,9.0,20.0,1.0,98.0
8,Qooglians,20.0,25.0,3.0,1.0,5.0,30.0
9,Thumbers,100.0,80.0,5.0,7.0,3.0,83.0


#### Exercise 2: Subsetting by rows

The Council is most interested in looking at civilizations who have made contact relatively recently and have advanced technology. Select civilizations with a score in the bottom 25% of the values for `Years_since_first_contact` and civilizations with a `Technological_progress` score of 7 or higher.

In [31]:
#Student Code Here

# Calculate the threshold for the bottom 25% of 'Years_since_first_contact'
contact_threshold = et_data_subset['Years_since_first_contact'].quantile(0.25)

# Filter based on the conditions
recent_advanced_civilizations = et_data_subset[
    (et_data_subset['Years_since_first_contact'] <= contact_threshold) & 
    (et_data_subset['Technological_progress'] >= 7) &
    (et_data_subset['Hostility_to_Earth_Index'] <= 10) &
    (et_data_subset['Diplomatic_relations_index'] <= 10)
]
# Check that changes have been made with describe()
recent_advanced_civilizations.describe()


Unnamed: 0,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
count,10.0,10.0,10.0,10.0,10.0,10.0
mean,24.5,30.3,4.0,1.1,5.0,35.9
std,11.890706,16.506228,1.632993,1.197219,1.414214,16.189503
min,10.0,10.0,2.0,0.0,2.0,15.0
25%,16.25,17.5,3.0,0.0,4.25,26.25
50%,22.5,27.5,4.0,1.0,5.0,34.0
75%,30.0,38.25,4.75,1.75,6.0,41.5
max,45.0,63.0,7.0,3.0,7.0,69.0


### Stop & Smell the Roses

1. Have your initial hypotheses/conjectures changed on further investigation of the data?
2. What additional issues have you noticed with the data that you'll want to clean up when we get to data cleaning?
3. Have you thought of any ways you'll want to transform your project data? (e.g, New variables you want to create? Columns you want to drop/add, ...)
    - List them or go ahead and do them. 
    - Removed Cultural, gotten below 25%, removed any index above 10

### Part III: Descriptive Statistics

#### Exercise 1 


The `value_counts()` method gives frequency counts for qualitative variables, so it doesn't make a lot of sense to use it on the et_data dataset. To try it out, import `zorga_animals.csv` as `zorga_animals` and find out how many animals live in each type of habitat on Zorga? 

In [None]:
#Student Code here

# output the frequency of value occurrences in column
# Output summary value 

#### Exercise 2

How long ago did Earth make contact with the first civilization contacted in this dataset?

In [None]:
#Student Code here

# Output summary value 

#### Exercise 3

What is the extent of positive relations that the average civilization maintains with Earth?

In [None]:
#Student Code here

# Output summary value 

#### Exercise 4

What are the characteristics of the most hostile civilizations?

In [None]:
#Student Code here

# Create variable

# groupby

# display to check


### Stop & Smell the Roses

1. Have your hypotheses/conjectures changed on further investigation of the data?
2. What additional issues have you noticed with the data that you'll want to clean up when we get to data cleaning?
3. Have you thought of any more ways you'll want to transform your project data? (e.g, New variables you want to create? Columns you want to drop/add, ...)
4. What visualizations do you think might help you investigate your hypotheses/conjectures? (We'll dig into these next class)