# #Question 

Our data science team has predicted that the Earth is going to be invaded by an alien force in the
next years. Our only hope is to replicate a device that can block all alien technology in a radius of
~300km. Sadly, the device was sold in 2004 to an anonymous buyer to protect her hometown and
we don't know how contact her again. We know that the device has been active since 2004 in one
city in the USA, and we want to know where to start our search. 

We've included a dataset called ufo.csv. This dataset contains over 80,000 reports of UFO sightings
over the last century (all of them verified by the ESA). Using this dataset, try to guess the city in
which the device has been hidden.

In [1]:
import pandas as pd
import numpy as np
import folium

In [2]:
df = pd.read_csv("ufo.csv", parse_dates=['datetime'])

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78509 entries, 0 to 78508
Data columns (total 14 columns):
Unnamed: 0     78509 non-null int64
datetime       78509 non-null object
city           78509 non-null object
state          72714 non-null object
country        68947 non-null object
shape          76599 non-null object
duration       78509 non-null float64
total_time     78509 non-null object
comments       78495 non-null object
date_posted    78509 non-null object
latitude       78509 non-null float64
longitude      78509 non-null float64
year           78509 non-null int64
distance       78509 non-null float64
dtypes: float64(4), int64(2), object(8)
memory usage: 8.4+ MB


In [4]:
# We delete all unnecesary columns for our mission, additionally, we drop NaN values in our most valued columns
df.drop(columns = ['Unnamed: 0', 'shape', 'date_posted', 'duration', 'total_time', 'comments', 'date_posted'], inplace = True)
df.dropna(subset=['datetime', 'city', 'country'], inplace = True)

In [5]:
# We give column "datetime" a format easier to work with. Also, in order to filter by date, let's set it up as index
df["datetime"] = df['datetime'].str.replace('24:00', '0:00')
df["datetime"] = pd.to_datetime(df["datetime"], format = "%m/%d/%Y %H:%M")
df.set_index('datetime', inplace=True)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 68947 entries, 1949-10-10 20:30:00 to 2013-09-09 23:00:00
Data columns (total 7 columns):
city         68947 non-null object
state        66406 non-null object
country      68947 non-null object
latitude     68947 non-null float64
longitude    68947 non-null float64
year         68947 non-null int64
distance     68947 non-null float64
dtypes: float64(3), int64(1), object(3)
memory usage: 4.2+ MB


In [7]:
df.head()

Unnamed: 0_level_0,city,state,country,latitude,longitude,year,distance
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1949-10-10 20:30:00,san marcos,tx,us,29.883056,-97.941111,2004,1242.667772
1955-10-10 17:00:00,chester (uk/england),,gb,53.2,-2.916667,2008,6515.416577
1956-10-10 21:00:00,edna,tx,us,28.978333,-96.645833,2004,1211.971352
1960-10-10 20:00:00,kaneohe,hi,us,21.418056,-157.803611,2004,6960.923396
1961-10-10 19:00:00,bristol,tn,us,36.595,-82.188889,2007,427.334113


In [8]:
# Let's remember the conditions:
    # Must be from 2004 onwards
    # Must be in the USA (Oh wait!)
    # The application radious must be around 300 km effective

ufo_dev = df.loc["2004-01-01 00:00":]
ufo_dev = ufo_dev[ufo_dev['country'].str.contains("us")]
ufo_dev = ufo_dev[(ufo_dev['distance'] >= 300) & (ufo_dev['distance'] <= 400)]

In [9]:
ufo_dev.head()

Unnamed: 0_level_0,city,state,country,latitude,longitude,year,distance
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2004-10-10 04:18:00,terre haute,in,us,39.466667,-87.413889,2004,366.596468
2006-10-10 12:37:00,blairsville,ga,us,34.876111,-83.958333,2006,305.862395
2007-10-10 01:00:00,stockbridge,ga,us,33.544167,-84.233889,2007,382.729821
2007-10-10 20:30:00,conyers,ga,us,33.6675,-84.017778,2007,385.562497
2008-10-10 21:30:00,cincinnati,oh,us,39.161944,-84.456944,2008,395.54763


In [10]:
# And, if we refine the search by making it as close as the minimum as possible
ufo_dev["distance"].idxmin()

Timestamp('2007-02-05 12:30:00')

In [11]:
ufo_dev['2007-02-05 12:30:00']

Unnamed: 0_level_0,city,state,country,latitude,longitude,year,distance
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2007-02-05 12:30:00,new tazewell,tn,us,36.4425,-83.599722,2007,300.016444


In [18]:
print(f"The town we are looking for is:")
print(ufo_dev.loc[ufo_dev["distance"].idxmin()])

The town we are looking for is:
city         new tazewell
state                  tn
country                us
latitude          36.4425
longitude        -83.5997
year                 2007
distance          300.016
Name: 2007-02-05 12:30:00, dtype: object


In [12]:
# It seems that the city we are looking for is New Tazewell, Tenessee

location = ufo_dev[['latitude', 'longitude']]['2007-02-05 12:30:00']
mapa = folium.Map(
        location=location,
        zoom_start=12,
        tiles='Stamen Terrain'
    )
folium.Marker(location).add_to(mapa)
mapa    