# Extraterrestrial Diplomatic Service Project

#### The Problem Statement

It's 2733, and you are a data scientist for the Extraterrestrial Diplomatic Service. The Service is regularly tasked with providing guidance to the Galactic Council on potential trade and business relations with extraterrestrial civilizations. This analysis helps the Council in understanding the potential for fruitful collaborations.

At the annual meeting of the Extraterrestrial Diplomatic Service, presenters highlighted the success of several joint space missions conducted in collaboration with extraterrestrial civilizations. They suggested that these past achievements could indicate potential for expanding partnerships into areas such as trade and business. They wondered what other characteristics of extraterrestrial civilizations could serve as predictors of future successful partnerships.

**Your job** is to do EDA with the dataset to begin this analysis.
**The goal** is to create a report that:
1. Recommends variables that could serve as predictors of future successful partnerships to the council.
2. Backs up your suggestions with numerical data and graphs.

- Your dataset, `extraterrestrial_civilizations.csv` has a randomly selected set of 50 civilizations' information for the following variables:

`Name_of_civilzation`: The civilization's name

`Years_since_first_contact`: Number of years since humanity first made contact with this civilization. (0-300)

`Technological_progress`: A measure of the civilization's overall technological progress on a scale from 1 to 100.

`Diplomatic_relations_index`: A measure of diplomatic relations between Earth and the civilization on a scale from 1 to 10, with higher values indicating more positive relations.

`Cultural_exchange_index`: A measure of the degree of cultural exchange between Earth and the  civilization on a scale from 1 to 10, with higher values indicating more exchange.

`Joint_space_missions`: The number of joint space missions between Earth and the civilization.

`Hostility_to_Earth_Index`: A measure of the civilization's hostility to Earth on a scale from 1 to 10, with higher values indicating more hostility.

`Degree_of_positive_contact`: A continuous variable measuring the degree of positive  contact with Earth on a scale from 1 to 100, with higher values indicating more positive contact.



### Part I: Getting to know your dataset

### Exercises:

#### Exercise 1: Import the data

Import  pandas & then read the extraterrestrial_civilizations.csv file into a DataFrame called `et_data`.

In [None]:
# Import the pandas library.

# read the CSV file into a DataFrame


#### Exercise 1 Answer

In [17]:
import pandas as pd

# Read the extraterrestrial_civilizations.csv file into a DataFrame named et_data
et_data = pd.read_csv("../datasets/Extraterrestrial_civilizations.csv")



#### Exercise 2 - Preview the data

View the first ten rows of the DataFrame `et_data`.

In [18]:
# View the first ten rows

#### Exercise 2 Answer

In [21]:
# View the first ten rows
et_data.head(10)

Unnamed: 0,Name_of_civilzation,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Cultural_exchange_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
0,Venusians,50.0,20.0,7.0,5.0,1.0,3.0,55.0
1,Gargeleblobs,120.0,85.0,8.0,,12.0,2.0,88.0
2,Vogons,75.0,50.0,6.0,,4.0,4.0,60.0
3,Betelgeusians,200.0,90.0,9.0,,15.0,1.0,95.0
4,Pluvarians,150.0,70.0,7.0,,10.0,3.0,75.0
5,Xytrons,10.0,30.0,4.0,,0.0,6.0,40.0
6,Zarblatts,80.0,60.0,14.0,,8.0,2.0,70.0
7,Kritons,175.0,95.0,9.0,,20.0,1.0,98.0
8,Qooglians,20.0,25.0,3.0,,1.0,5.0,30.0
9,Thumbers,100.0,80.0,5.0,,7.0,3.0,83.0


#### Exercise 3 - Check the dimensions

Check the dimensions of the `et_data` DataFrame.

In [1]:
# check the dimensions of your DataFrame


#### Exercise 3 Answer

In [None]:
# check the dimensions of your DataFrame
et_data.shape

#### Exercise 4 Summarize the DataFrame

Show a quick summary of the `et_data` DataFrame.

In [None]:
# quick summary


#### Exercise 4 Answer

In [20]:
# quick summary
et_data.describe()

Unnamed: 0,Years_since_first_contact,Technological_progress,Diplomatic_relations_index,Cultural_exchange_index,Joint_space_missions,Hostility_to_Earth_Index,Degree_of_positive_contact
count,51.0,51.0,51.0,17.0,51.0,51.0,51.0
mean,96.862745,54.784314,6.686275,5.882353,6.784314,11.431373,61.431373
std,62.088331,25.864504,2.723896,1.363926,5.954204,55.709337,24.976193
min,5.0,5.0,1.0,3.0,0.0,1.0,8.0
25%,47.5,33.5,5.0,5.0,2.0,1.0,41.0
50%,90.0,55.0,7.0,6.0,5.0,3.0,62.0
75%,145.0,77.5,9.0,7.0,10.5,4.0,81.5
max,230.0,95.0,14.0,8.0,20.0,400.0,100.0


### Stop & Smell the Roses

List 3 things you notice or find interesting about the dataset so far:
1.
2.
3.


### Part II: Manipulating your DataFrame

#### Exercise 1: Subsetting by columns

Create a subset of the `et_data` DataFrame with the variables (columns) that you think might be most important for your analysis.

In [None]:
#Student Code Here


# Create a subset


#### Answers will vary


General code format:

```
# subset the DataFrame
_________ = et_data[[ '___' , '___' , ..., '___']]
```

#### Exercise 2: Subsetting by rows

The Council is most interested in looking at civilizations who have made contact relatively recently and have advanced technology. Select civilizations with a score in the bottom 25% of the values for `Years_since_first_contact` and civilizations with a `Technological_progress` score of 7 or higher.

In [23]:
#Student Code Here

# Calculate the threshold for the bottom 25% of 'Years_since_first_contact'

# Filter based on the conditions

# Check that changes have been made with describe()



#### Exercise 2 Answer


In [26]:
# Selecting civilizations in the bottom 25% for 'Years_since_first_contact' and with 'Technological_progress' of 7 or higher

# Calculate the threshold for the bottom 25% of 'Years_since_first_contact'
years_threshold = et_data['Years_since_first_contact'].quantile(0.25)

# Filter based on the conditions
filtered_civilizations = et_data[
    (et_data['Years_since_first_contact'] <= years_threshold) & 
    (et_data['Technological_progress'] >= 7)
]

# Check that changes have been made with describe()
filtered_civilizations[['Years_since_first_contact', 'Technological_progress']].describe()

Unnamed: 0,Years_since_first_contact,Technological_progress
count,12.0,12.0
mean,25.416667,28.666667
std,11.171867,15.42332
min,10.0,10.0
25%,18.75,18.0
50%,25.0,25.0
75%,31.25,34.75
max,45.0,63.0


### Stop & Smell the Roses

- Are there ways that you think you'll want to manipulate or transform your data for analysis?
- List them or go ahead and do them. 

### Part III: Descriptive Statistics

#### Exercise 1 


The `value_counts()` method gives frequency counts for qualitative variables, so it doesn't make a lot of sense to use it on the et_data dataset. To try it out, import `zorga_animals.csv` as `zorga_animals` and find out how many animals live in each type of habitat on Zorga? 

In [None]:
#Student Code here

# output the frequency of value occurrences in column
# Output summary value 

#### Exercise 1 Answer

In [25]:
# read CSV file into DataFrame
zorga_animals = pd.read_csv("../datasets/zorga_animals.csv")
 

# output the frequency of value occurrences in column
zorga_animals['Habitat'].value_counts()

Habitat
Zero Gravity Sky       11
Volcanic Mountains      6
Underground Caves       6
Lush Forests            6
High Gravity Plains     5
Frozen Tundra           5
Gas Swamps              5
Methane Oceans          3
Desert Plains           3
Name: count, dtype: int64

#### Exercise 2

How long ago did Earth make contact with the first civilization contacted in this dataset?

In [None]:
#Student Code here

# Output summary value 

#### Exercise 2 Answer


In [27]:
# output summary value 
et_data['Years_since_first_contact'].max()

np.float64(230.0)

#### Exercise 3

What is the extent of positive relations that the average civilization maintains with Earth?

In [None]:
#Student Code here

# Output summary value 

#### Exercise 3 Answer


In [30]:
# output summary value 
et_data['Degree_of_positive_contact'].mean()

np.float64(61.431372549019606)

#### Exercise 4

What are the characteristics of the most hostile civilizations?

In [None]:
#Student Code here

# Create variable

# groupby

# display to check


#### Exercise 4 Answer

In [None]:
# Create variable
et_data['hostility_range'] = pd.cut(et_data['Hostility_to_Earth_Index'], bins=[0, 3.3, 6.6, 10], labels=['Low', 'Medium', 'High'])


# groupby
mean_characteristics_by_hostility = et_data.groupby('hostility_range')[['Years_since_first_contact', 'Technological_progress', 'Diplomatic_relations_index', 'Cultural_exchange_index', 'Joint_space_missions']].mean()


# display to check
mean_characteristics_by_hostility