**Overview:** In this lab assignment, you will practice the design and use of Python classes to solve a specific computational task involving AIS vessel data.

**The Scenario:**  You are working on a research project that is trying to understand the behavior of ocean vessels in the South China Sea (SCS).  Specifically, you have been given a large data file containing **Automatic Identification System (AIS) data** from vessels traveling in the SCS over a one-month period.  You can read about the basics of AIS on Wikipedia: https://en.wikipedia.org/wiki/Automatic_identification_system.

The data file is in comma separated value (CSV) format and has a header row indicating what is in each column.  Specification of the individual data fields is available from: https://www.navcen.uscg.gov/?pageName=AISMessagesAStatic.  The key identifier is the Maritime Mobile Service Identity (MMSI), a series of nine digits which are used to uniquely identify a ship.

The name of the file is ``jul2015.csv`` and is downloadable from the Lab 1 page on Sakai.

**Your Task:**  Your task is to write Python code to read, parse, and analyze the AIS data in order to answer several questions.  A few important notes on this assignment:

* This notebook contains all of the questions for this lab assignment.  


* You will answer these questions by writing and executing your code **in this notebook**.  


* There is a separate file ``oa3801_lab1.py`` that contains supporting code (Python classes and functions); some of this code is incomplete and will need to be finished by you.  Once completed, you will import this code into this notebook to answer these questions (more on this below).


* The goal of this assignment is to give you practice writing and using Python objects.  **You may NOT use the Pandas library for this assignment.**


As always, completing a complex task requires breaking things down into individual steps.  Accordingly, you should complete the following tasks.

**Task 0: Have a look at the data.**  The file ``jul2015.csv`` is big (~38MB).  Excel might have a little trouble with this one.  You can also try opening the file with a text editor.  If you are having trouble, consider installing a general editor like SublimeText ( https://www.sublimetext.com ) or Atom ( https://www.atom.io ).  Atom is free to download and install, but SublimeText requires a donation after a trial period. Both work on both Windows and Mac, and are fairly powerful.  It's important that you have an editor that can handle large files.  The choice of a text editor is often a personal preference, but both of these are good options (and there are others out there).

**Task 1: Have a look at the starting code.**  The file ``oa3801_lab1.py``  contains definitions for two Python classes, along with some supporting functions.  


* A ``Coordinate`` class used for representing lat/lon positions and calculating great circle distances between them.


* A ``Vessel`` class used to store ship characteristics, along with a recorded sequence of AIS positions, known here as a *track*.


Have a look at class definitions along with some of the additional functions defined.  Use the standalone notebook ``oa3801_lab1_supplement.ipynb`` to understand and practice use of these classes.

**Task 2: Write some code to parse the data in the csv file and store it in a dictionary of ships.** 


* Create an empty dictionary in which each key is a MMSI (type string) and the value will be the associated Vessel object


* Open the csv file and read it line by line, parsing into individual fields.  For each line:

  * Get the ship from the dictionary by its MMSI value.  If the associated ship does not exist in the dictionary, then create a new Vessel and add it.  
  
  * Add the corresponding position information to the track of that ship.


* You ought to be able to populate the dictionary of ships and all of their corresponding track data *in a single pass through the csv data*!  (Reading through the data multiple times will be slooooow.  Why?)

Use the following cells to complete this task.

In [1]:
import GITHUB_Lab01 as lab1
from GITHUB_Lab01 import Coordinate
from GITHUB_Lab01 import Vessel
import datetime
from tabulate import tabulate


In [2]:
### PUT YOUR CODE HERE
# hint: you will need to import the Vessel class


def ships_dict(file):
    f = open(file) # 
    f.readline() # This strips the first line of the code. 
    print('reading data from inputfile %s...' %  file) 
    vessels_dict = {} # This is the empty dict to feed my key value pairs

    for nextline in f:
    #     print (nextline)
    #     print (nextline [0])
        lineitems = nextline.strip().split(',') # The strip clears all blank characters

        '''Naming all of the variables i within the excell file'''
        MMSI = lineitems[2]
        vessel_name = lineitems[1]
        imo = lineitems[3]
        callsign = lineitems[4]
        vessel_type = lineitems[5]
        length = lineitems[6]
        lat = lineitems[7]
        long = lineitems[8]
        sog = lineitems[9]
        cog = lineitems[10]
        heading = lineitems[11]
        time = lineitems[12] 

        '''Making my dictionary'''
        if MMSI not in vessels_dict:
            vessels_dict[MMSI] = Vessel(MMSI,vessel_name,vessel_type,length) # using the vessel class I can fill my dictionary making each value an object of information. 
        vessels_dict[MMSI].add_to_track(time,lat,long,callsign) # The track information needs to be outside the loop. '''need to recall why...'''
    f.close() # This is the final part to opening a data file
    
    
    
#     for k, v in vessels_dict.items():
#         print(k, v.track)
    return vessels_dict # I need to be able to use the data. So i returned it




In [3]:
v2 = ships_dict('jul2015.csv') # so i dont have to keep calling the function to access my data. I just assigned it to a variable.
for MMSI, data in v2.items():
    ip_timestamp, ip_coord = data.initial_position()
    fp_timestamp, fp_coord = data.final_position()
#     print(ip_coord,fp_coord)
    
    distance = ip_coord.calc_dist(fp_coord) 
#     print(distance)
    time_delta = fp_timestamp - ip_timestamp
    

reading data from inputfile jul2015.csv...


#### Good job.  Now use this to answer the following questions.

### Question 1: How many ships are in this data file? 
#### (Hint: use your dictionary)

In [4]:
'''The length of the track of the vessel'''
print('There are %d ships in this data file.' % len(v2))

There are 12578 ships in this data file.


#### Wow, that's a lot.  Fortunately, for the specific project you are working on, you are only interested in ships that have at least 1000 positions in their track data.

### Question 2: How many ships have at least 1000 positions in their track data?  Print them out.

In [5]:
# def plus_ultra(dictionary):
counter = 0
vessel_1000_list = [] # I made a list so i can resure the ships for the other questions without having to reaccess the data from the huge dictionary. 
for keys, vessels in v2.items(): # This allows me to use both the keys and associated objects conjointly. 
    if len(vessels.track)>=1000:  # vessels.track gives me the positions
#             print(keys)
        print(vessels.name)
        vessel_1000_list.append(vessels)
        counter +=1 # counts the number of ships that is about 1000
print('')
print('There are %d ships with at least 1000 positions.' %counter)




"600013035"
"Lct 568"
"Taiko"
"Golden Star5"
"Ferry Fukue"
"Lct 18"

There are 6 ships with at least 1000 positions.


### Question 3: For these ships, what is their average speed?

#### Here, we define "average speed" to be the distance between the initial position and the final position, divided by the interval of time between those two positions.  

#### Hint: great circle calculation for distance between initial and final position.

#### Another Hint: use datetime objects to calculate time intervals.

In [6]:
'''For the ships with more than 1000 positions'''
'''use the time deltas'''
# print(v2['600013035'].track)
i = 1 # This is for formatting purposes. 
for ships in vessel_1000_list:
#     print(ships.name)
    (date_time_initial ,coord_initial) = ships.initial_position()
    (date_time_final, coord_final) = ships.final_position() # unpacking the final times and coords
    distance_differential = coord_final.calc_dist(coord_initial)

    time_delta = date_time_final - date_time_initial
    time_delta_seconds = time_delta.total_seconds()
    average_speed = distance_differential / time_delta_seconds
    print(ships.name,average_speed)

"600013035" 0.00023693641151921645
"Lct 568" 4.046703232766495e-05
"Taiko" 5.962765922812879e-05
"Golden Star5" 7.591348643908771e-07
"Ferry Fukue" 2.1247029301991035e-05
"Lct 18" 3.918681022443403e-06


In [7]:
'''For the ships with more than 1000 positions'''
'''use the time deltas'''
# print(v2['600013035'].track)
i = 1 # This is for formatting purposes. 
for ships in vessel_1000_list:
#     print(ships.name)
    (date_time_initial ,coord_initial) = ships.initial_position()
    (date_time_final, coord_final) = ships.final_position() # unpacking the final times and coords
    distance_differential = coord_final.calc_dist(coord_initial)
#     print(distance_differential)
    time_delta = date_time_final - date_time_initial
    time_delta_delta = time_delta.days *24 + time_delta.seconds/3600 # this converts the time into hours.
    
#     print(time_delta_delta)
    average_speed = distance_differential / time_delta_delta
    
    print('%d ship name:%s, average speed:%.5f nm/hour'% (i,ships.name,average_speed))
    i +=1

1 ship name:"600013035", average speed:0.85297 nm/hour
2 ship name:"Lct 568", average speed:0.14568 nm/hour
3 ship name:"Taiko", average speed:0.21466 nm/hour
4 ship name:"Golden Star5", average speed:0.00273 nm/hour
5 ship name:"Ferry Fukue", average speed:0.07649 nm/hour
6 ship name:"Lct 18", average speed:0.01411 nm/hour


#### This looks a bit suspicious.  Those great circle distances between initial position and final position are perhaps too small.

#### Let's repeat the previous analysis, using the total track distance instead of the great circle distance.  To do this, we need to write another function.

### Question 4: Using the total track distance (instead) for these ships, calculate the correct average speed for each of these ships.

Specifically, we calculate the total distance as the sum of all distances between successive intervals (yes, we assume that ships traveled in straight lines between successive position readings).

**Task 3:** The ``Vessel`` class has an incomplete method called ``total_track_distance``.  Complete this function, then use the block below to report the total track distance for each of these ships.

In [8]:
def average_speed(vessel_1000_list):
    k = 1 # This allows me to print int numbers. This is just for pictorial purposes
    for ships_q4 in vessel_1000_list:
#         print(len(ships_q4.track))
        '''Unpacking the data data'''
        (date_time_initial_q4 ,coord_initial) = ships_q4.initial_position()
        (date_time_final_q4, coord_final) = ships_q4.final_position() # unpacking the final times and coords
    
        '''This is using the new total_track_distance'''
        distance_q4 = ships_q4.total_track_distance()
#         print(distance_q4)
    
        '''The time deltas'''
        time_delta_q4 = date_time_final_q4 - date_time_initial_q4
        time_delta_delta_q4 = time_delta_q4.days *24 + time_delta_q4.seconds/3600

#         print(time_delta_delta)
#         print('')
        average_speed_q4 = distance_q4 / time_delta_delta_q4
        print('%d ship name:%s, average speed:%.3f nm/hour'% (k,ships_q4.name,average_speed_q4))
        k +=1  

In [9]:
average_speed(vessel_1000_list)

1 ship name:"600013035", average speed:1.354 nm/hour
2 ship name:"Lct 568", average speed:34.576 nm/hour
3 ship name:"Taiko", average speed:5.298 nm/hour
4 ship name:"Golden Star5", average speed:0.105 nm/hour
5 ship name:"Ferry Fukue", average speed:5.864 nm/hour
6 ship name:"Lct 18", average speed:1.413 nm/hour


#### Does anything else look suspicious here?  Use the next block to respond.  (Use words, not code, to answer the question.)


**My Answer:** The first code that is written provides a much smaller average speed per hour. Since not every leg is taken into consideration in the first code the total distance is incomplete. Thus resulting in a smaller average speed. 

### Question 5: Consider all ships (not just those with more than 1000 pings). Are there any ships with multiple call signs?  Write some code to answer this question.  Have your code print out the total number of MMSIs that have multiple call signs, and for each of them print out the call signs.

In [10]:
def multiple_callsigns(vessel_dictionary):
    '''Print out the ship with multiple call signs and what are the call signs'''
    call_sign_dict = dict() # This way i can tired the ship to the different callsigns using the vessel object
    call_sign_list = [] # This is to track the different callsign list
    for (MMSI, vessel_obj) in v2.items(): # This tuple pulls both the key (MMSI) and vessel object from the ORIGINAL dict so I can use the data easier.
    #     print(MMSI, stuff)
        if MMSI not in call_sign_dict: # This is creats the new dict with MMSI as keys and the callsigns as vaules
            call_sign_dict[MMSI] =  vessel_obj.call_signs
        else:
            call_sign_dict[MMSI].append(vessel_obj.call_signs) # the append feature will  accounts for the multiple callsigns.
        if len(call_sign_dict[MMSI]) > 1: # This tells me how many ships have multiple callsigns. 
            call_sign_list.append(MMSI)
#             print (vessel_obj.name, call_sign_dict[MMSI])
    print('')
    print('There are %d ship with multiple callsigns'%len(call_sign_list) )

In [11]:
multiple_callsigns(v2)


There are 120 ship with multiple callsigns


### Question 6: Do any of these look suspicious to you?  
#### How would you go about assessing whether each of these is a single vessel or multiple vessels?  
#### How would your code need to change to support this analysis?
#### (Type your answer below.  You do NOT have to implement this in code.)

**My Answer:** The way my current code is written there are some ships that look like they have mulitple callsigns, but that is not the case. The empty quotes could mean that the turned off their transponder. To fix this i would just write a conditional to sure that the "" blocks are not taken into consideration with respect to the multiple callsigns. 

### For the last question, we need to complete an additional function in the ``Vessel`` class

**Task 4:** The ``Vessel`` class has one more incomplete method, namely:


* ``last_known_position`` : this function should return the most recent (timestamp, position) based on an argument, called the *request_time*. 

  * If ``request_time`` is before the initial timestamp for a vessel, return (``None``,`` None``).
  * If after the final timestamp, return the final available (timestamp, position) .
  * If the ``request_time`` is between the intial and final timestamp, then it should return the (timestamp, position) just prior to ``request_time``.  

Complete this function, then answer Question 7...

### Question 7: Two specific vessels, MMSI '548438100' and MMSI '600013035', are suspected of a rendezvous.  Based on the track data for each ship, what is the closest that they came to each other (and at what time)?

#### Hint: Look at the track for SHIP1.  For each (time, position) entry in SHIP1's track, get the last_known_position for SHIP2 at that time, and calculate the distance.  Return the smallest of these, along with the time.

In [12]:
ship1 = v2['600013035']
ship2 = v2['548438100']
def rendezvous(ship1,ship2):
    distance_list = [] # I am creating a list for the distances so i can sort and return the smallest time
    
    for ship1_time in ship1.track: #Ship1_time is a key of date.time types.
        ship1_position = ship1.track[ship1_time] #this is ship1's position
        (ship2_time, ship2_position) = ship2.last_known_position(ship1_time) #I am returing a tuple from my script. So I am pulling them and putting them into variables to reuse.
       
    
        if isinstance(ship1_position,Coordinate) and isinstance(ship2_position, Coordinate): #I need this to ensure that I do not try to calculate the None types. I will get an error.
            distance = (ship1_position.calc_dist(ship2_position))
            distance_list.append((distance,ship1_time))
    sorted_distance_list = sorted(distance_list)

    return sorted_distance_list[0] #This return statement will return both the calculated smallest distance and time. 

In [None]:
rendezvous(ship1,ship2)

#### Does it matter which MMSI is SHIP1 and SHIP2?  Try it both ways.  What do you observe?  Why?  

In [None]:
rendezvous(ship2,ship1)
#'''It does matter which MMSI is identified  as ship1 or ship 2. The data has is missing some track positions for certain time periods, and 
# that can be due to a ships' AIS transponder being turned off or faulty. '''