# Data Collection

In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore if we can determine if the first stage will land, we can determine the cost of a launch. This information can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this lab, you will collect and make sure the data is in the correct format from an API. The following is an example of a successful and launch.

![spaceX Data](https://www.teslarati.com/wp-content/uploads/2020/04/Falcon-Heavy-Demo-Feb-2018-SpaceX-1-crop-2048x956.jpg)

# Objectives

In this lab, you will make a get request to the SpaceX API. You will also do some basic data wrangling and formating.

1. Request to the SpaceX API
2. Clean the requested data

In [15]:
#import necessary libraries
import requests
import pandas as pd
# NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Datetime is a library that allows us to represent dates
import datetime


In [11]:
# Create a Client for SpaceX
class SpaceXClient:
    """
    This is a spacex client that will help to fire queries
    """
    
    def __init__(self,*args,**kwargs):
        self.base_url = "https://api.spacexdata.com/v4/"
        
    def format_get_error(self, error:str)->str:
        return f"{error.get('error', '')}"
    
    def get(self,resource,resource_id=None,*args,**kwargs)->dict:
        """
        This function will handle the get requests for SpaceX
        """
        result = {'json':None, "f_error" :"something went wrong ..."}
        try:
            url = self.base_url + f"{resource}"
            if resource_id:
                url+=f"/{resource_id}"
            result['url'] = url
            response = requests.get(url)
            result['status_code'] = response.status_code or ''
            response = response.json()
            
            if result["status_code"] in [200,201]:
                result['json'] = response
            else:
                result["f_error"] = self.format_get_error(response)
                result["error"] = response
                
        except Exception as e:
            result['error'] = result["f_error"] = e.args[0]
            
        return result
    
    def post(self,resource:str,resource_id=None,*args,**kwargs):
        pass
    
    def delete(self,resource,resource_id=None,*args,**kwargs):
        pass
    

In [56]:
sx = SpaceXClient()
print(sx.get('launches/past')['json'][0])

{'fairings': {'reused': False, 'recovery_attempt': False, 'recovered': False, 'ships': []}, 'links': {'patch': {'small': 'https://images2.imgbox.com/3c/0e/T8iJcSN3_o.png', 'large': 'https://images2.imgbox.com/40/e3/GypSkayF_o.png'}, 'reddit': {'campaign': None, 'launch': None, 'media': None, 'recovery': None}, 'flickr': {'small': [], 'original': []}, 'presskit': None, 'webcast': 'https://www.youtube.com/watch?v=0a_00nJ_Y88', 'youtube_id': '0a_00nJ_Y88', 'article': 'https://www.space.com/2196-spacex-inaugural-falcon-1-rocket-lost-launch.html', 'wikipedia': 'https://en.wikipedia.org/wiki/DemoSat'}, 'static_fire_date_utc': '2006-03-17T00:00:00.000Z', 'static_fire_date_unix': 1142553600, 'net': False, 'window': 0, 'rocket': '5e9d0d95eda69955f709d1eb', 'success': False, 'failures': [{'time': 33, 'altitude': None, 'reason': 'merlin engine failure'}], 'details': 'Engine failure at 33 seconds and loss of vehicle', 'crew': [], 'ships': [], 'capsules': [], 'payloads': ['5eb0e4b5b6c3bb0006eeb1e1'

### Important Points
You will notice that a lot of the data are IDs. For example the rocket column has no information about the rocket just an identification number.

We will now use the API again to get information about the launches using the IDs given for each launch. Specifically we will be using columns **rocket, payloads, launchpad, and cores**.

From the **rocket** we would like to learn the booster name

From the **payload** we would like to learn the mass of the payload and the orbit that it is going to

From the **launchpad** we would like to know the name of the launch site being used, the longitude, and the latitude.

From **cores** we would like to learn the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

In [53]:
# Normalize the data and return pandas DataFrame
class DataCollection:
    
    def __init__(self,*args,**kwargs):
        """
        Constructor for this class.
        """
        self.client = SpaceXClient()
        self.launches_data = self.get_past_launches()
        
        # define the global lists       
        self.booster_version = []
        self.launch_site = []
        self.lon = []
        self.lat = []
        self.payload = []
        self.orbit = []
        self.block = []
        self.reuse_count = []
        self.serial = []
        self.outcome = []
        self.flights = []
        self.gridfins = []
        self.reused = []
        self.legs = []
        self.landingpad = []
    
    def get_past_launches(self):
        """
        Call the Client function to hit request for past launches
        """
        data = pd.json_normalize(self.client.get(resource = 'launches/past').get('json'))
        return self.get_useful_features(data)
        
        
    def get_useful_features(self,data):
        
        # Lets take a subset of our dataframe keeping only the features we want and the flight number, and date_utc.
        data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

        # We will remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
        data = data[data['cores'].map(len)==1]
        data = data[data['payloads'].map(len)==1]

        # Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
        data['cores'] = data['cores'].map(lambda x : x[0])
        data['payloads'] = data['payloads'].map(lambda x : x[0])

        # We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
        data['date'] = pd.to_datetime(data['date_utc']).dt.date

        # Using the date we will restrict the dates of the launches
        data = data[data['date'] <= datetime.date(2020, 11, 13)]
        
        return data

    # Takes the dataset and uses the rocket column to call the API and append the data to the list
    def getBoosterVersion(self):
        for rocket_id in self.launches_data['rocket']:
            response = self.client.get(resource = "rockets", resource_id=str(rocket_id)).get('json',[])
            self.booster_version.append(response['name'])
            
    # Takes the dataset and uses the launchpad column to call the API and append the data to the list
    def getLaunchSite(self):
        for launchpad_id in self.launches_data['launchpad']:
            response = self.client.get(resource = "launchpads", resource_id = str(launchpad_id)).get('json',[])
            self.lon.append(response['longitude'])
            self.lat.append(response['latitude'])
            self.launch_site.append(response['name'])
            
    # Takes the dataset and uses the payloads column to call the API and append the data to the lists
    def getPayloadData(self):
        for load in self.launches_data['payloads']:
            response = self.client.get(resource = "payloads", resource_id = load).get('json',[])
            self.payload.append(response['mass_kg'])
            self.orbit.append(response['orbit'])
    
    # Takes the dataset and uses the cores column to call the API and append the data to the lists
    def getCoreData(self):
        for core in self.launches_data['cores']:
                if core['core'] != None:
                    response = self.client.get(resource = "cores", resource_id = core['core']).get('json',[])
                    self.block.append(response['block'])
                    self.reuse_count.append(response['reuse_count'])
                    self.serial.append(response['serial'])
                else:
                    self.block.append(None)
                    self.reuse_count.append(None)
                    self.serial.append(None)
                self.outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
                self.flights.append(core['flight'])
                self.gridfins.append(core['gridfins'])
                self.reused.append(core['reused'])
                self.legs.append(core['legs'])
                self.landingpad.append(core['landpad'])
            
    def to_dataframe(self):
        """
        Prepare a pandas dataframe
        """
        self.getBoosterVersion()
        self.getPayloadData()
        self.getLaunchSite()
        self.getCoreData()
        
        launch_dict = {'FlightNumber': list(self.launches_data['flight_number']),
                        'Date': list(self.launches_data['date']),
                        'BoosterVersion':self.booster_version,
                        'PayloadMass':self.payload,
                        'Orbit':self.orbit,
                        'LaunchSite':self.launch_site,
                        'Outcome':self.outcome,
                        'Flights':self.flights,
                        'GridFins':self.gridfins,
                        'Reused':self.reused,
                        'Legs':self.legs,
                        'LandingPad':self.landingpad,
                        'Block':self.block,
                        'ReusedCount':self.reuse_count,
                        'Serial':self.serial,
                        'Longitude': self.lon,
                        'Latitude': self.lat}
        
        return pd.DataFrame(launch_dict)
        

In [54]:
spacex = DataCollection().to_dataframe()

In [57]:
spacex.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


In [59]:
spacex.shape

(94, 17)

## Filter the dataframe to only include *Falcon 9* launches

Finally we will remove the Falcon 1 launches keeping only the Falcon 9 launches. Filter the data dataframe using the **BoosterVersion** column to only keep the Falcon 9 launches. Save the filtered data to a new dataframe called **data_falcon9**.

In [61]:
data_falcon9 = spacex[spacex['BoosterVersion']!='Falcon 1']

Now that we have removed some values we should reset the **FlightNumber** column.

In [65]:
data_falcon9.iloc[:,0] = list(range(1, data_falcon9.shape[0]+1))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_falcon9.iloc[:,0] = list(range(1, data_falcon9.shape[0]+1))


In [66]:
data_falcon9.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


# Save DataFrame as File

In [67]:
data_falcon9.to_csv('../Data/dataset_part_1.csv', index=False)

# End Part 1