# Guidelines for Project 1

This document contains guidelines, requirements, and suggestions for Project 1.

## Team Effort

Before anything, remember that Projects are a **group effort**: Working closely with your teammates is a requirement. This both helps teach real-world collaborative workflows, and enables you to tackle more difficult problems than you'd be able to working alone. 

In other words, working in groups allows you to **work smart** and **dream big**. Take advantage of it!

## Project Proposal

Before you start writing any code, your group should outline the scope and purpose of your project. This helps provide direction and prevent [scope creep](https://en.wikipedia.org/wiki/Scope_creep).

Write this as a brief summary of your interests and intent, including:

* The kind of data you'd like to work with/field you're interested in (e.g., geodata, weather data, etc.)

* The kinds of questions you'll be asking of that data

* Possible source for such data

In other words, write down what kind of data you plan to work with, and what kinds of questions you'd like to ask of it. This constitutes your Project Proposal/Outline, and should look something like this:

> Our project is to uncover patterns in criminal activity around Los Angeles. We'll examine relationships between types of crime and location; crime rates and times of day; trends in crime rates over the course of the year; and related questions, as the data admits.

## Finding Data

Once your group has written an outline, it's time to start hunting for data. You are free to use data from any source, but we recommend the following curated sources of high-quality data:

* [data.world](https://data.world/)

* [Kaggle](https://www.kaggle.com/)

* [Data.gov](https://www.data.gov)

Chances are you'll have to update your Project Outline as you explore the available data. **This is fine**&mdash;adjustments like this are part of the process! Just make sure everyone in the group is up-to-speed on the goals of the project as you make changes.

## Data Cleanup &amp; Analysis

With data in hand, it's time to tackle development and analysis. This is where the fun starts!

Inevitably, the analysis process can be broken into two broad phases: **Exploration &amp; Cleanup** and **Analysis** proper.

As you've learned, you'll need to explore, clean, and reformat your data before you can begin to answer your research questions. We recommend keeping track of these exploration and cleanup steps in a dedicated Jupyter Notebook, both for organization's sake and to make it easier to  present your work later.

Similarly, after you've massaged your data and are ready to start crunching numbers, you should keep track of your work in a Jupyter Notebook dedicated specifically to analysis.

During both phases, **don't forget to include plots**! Don't make the mistake of waiting to build figures until you're preparing your presentation. Creating them along the way can reveal insights and interesting trends in the data that you might not notice otherwise.

Finally, be sure that your projects meet the [technical requirements](TechnicalRequirements.md).

## Presentation

After you've analyzed your data to your satisfaction, you'll put together a presentation to show of your work, explain your process, and discuss your conclusions.

This presentation will be delivered as a slideshow, and should give your classmates and instructional staff an overview of your work. PowerPoint, Keynote, and Google Slides are all acceptable for building slides. 

As long as your slides meet the [presentation requirements](PresentationRequirements.md), you are free to structure the presentation however you wish, but students are often successful with the format laid out in the [presentation guidelines](PresentationGuidelines.md).

## Sample Ideas &amp; Inspiration



- - - 

### Copyright

Coding Boot Camp &copy; 2017. All Rights Reserved.


In [1]:

import json
import requests
from time import sleep
import pandas as pd

api_key = '1bb6cfd646261b1acda61748ca2bb5a7'



In [2]:

url = 'http://api.themoviedb.org/3'
search_type = '/search/tv'
query = "star%20trek"
dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()


print(json.dumps(dat['results'], indent = 2, sort_keys= True))

[
  {
    "backdrop_path": "/lVNUYvjVRfbf0p7peJh5JhyBIir.jpg",
    "first_air_date": "1966-09-08",
    "genre_ids": [
      18,
      10765
    ],
    "id": 253,
    "name": "Star Trek",
    "origin_country": [
      "US"
    ],
    "original_language": "en",
    "original_name": "Star Trek",
    "overview": "Space. The Final Frontier. The U.S.S. Enterprise embarks on a five year mission to explore the galaxy. The Enterprise is under the command of Captain James T. Kirk with First Officer Mr. Spock, from the planet Vulcan. With a determined crew, the Enterprise encounters Klingons, Romulans, time paradoxes, tribbles and genetic supermen lead by Khan Noonian Singh. Their mission is to explore strange new worlds, to seek new life and new civilizations, and to boldly go where no man has gone before.",
    "popularity": 32.878016,
    "poster_path": "/3ATqzWYDbWOV2RBLWNwA43InT60.jpg",
    "vote_average": 8.07,
    "vote_count": 314
  },
  {
    "backdrop_path": "/s3kVP6R3LbJvvoPnDQEcJNEH2d

In [3]:
show_id = dat['results'][0]['id']
show_id

253

In [4]:
search_type = f'/tv/{show_id}'

dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()

In [5]:
seasons = []
for season in dat['seasons']:
    seasons.append({'season_num': season['season_number'], 'episode_count': season['episode_count']})
print(seasons[1])

{'season_num': 1, 'episode_count': 29}


In [6]:
search_type = f'/tv/{show_id}/credits'

dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()

main_cast = []
for character in dat['cast']:
    main_cast.append(character['name'])
# 'Majel Barrett', 'Bill Blackburn','Frank da Vinci' not really main cast?
main_cast


['William Shatner',
 'Leonard Nimoy',
 'DeForest Kelley',
 'Majel Barrett',
 'Walter Koenig',
 'George Takei',
 'Nichelle Nichols',
 'James Doohan',
 'Majel Barrett',
 'Bill Blackburn',
 'Frank da Vinci']

In [7]:
guest_list = []
guest_ids = []
id_to_name = {}
for season in seasons:
    season_num = season['season_num']
    episode_count = season['episode_count']
    for episode_num in range(1,episode_count):
        print(season_num, ':' , episode_num, end = '\t')
        sleep(.3)
        dat = requests.get(f'https://api.themoviedb.org/3/tv/{show_id}/season/{season_num}/episode/{episode_num}?api_key={api_key}&language=en-US').json()
        try:
            for star in dat['guest_stars']:
                if star['id'] not in guest_ids and star['name'] not in main_cast:
                    guest_list.append(star['name'])
                    guest_ids.append(star['id'])
                    id_to_name[star['id']] = star['name']

                    
        except KeyError:
            print('uh-oh', season_num, ':' , episode_num)
print()
print('DONE')



0 : 1	0 : 2	0 : 3	0 : 4	1 : 1	1 : 2	1 : 3	1 : 4	1 : 5	1 : 6	1 : 7	1 : 8	1 : 9	1 : 10	1 : 11	1 : 12	1 : 13	1 : 14	1 : 15	1 : 16	1 : 17	1 : 18	1 : 19	1 : 20	1 : 21	1 : 22	1 : 23	1 : 24	1 : 25	1 : 26	1 : 27	1 : 28	2 : 1	2 : 2	2 : 3	2 : 4	2 : 5	2 : 6	2 : 7	2 : 8	2 : 9	2 : 10	2 : 11	2 : 12	2 : 13	2 : 14	2 : 15	2 : 16	2 : 17	2 : 18	2 : 19	2 : 20	2 : 21	2 : 22	2 : 23	2 : 24	2 : 25	3 : 1	3 : 2	3 : 3	3 : 4	3 : 5	3 : 6	3 : 7	3 : 8	3 : 9	3 : 10	3 : 11	3 : 12	3 : 13	3 : 14	3 : 15	3 : 16	3 : 17	3 : 18	3 : 19	3 : 20	3 : 21	3 : 22	3 : 23	
DONE


In [10]:
#print(guest_list)
#print(guest_ids)
#print(id_to_name)
#id_to_name[132710]

{83913: 'Joseph Mell', 168061: 'Clegg Hoyt', 82510: 'Michael Dugan', 14508: 'John Hoyt', 1213153: 'Malachi Throne', 101742: 'Leonard Mudie', 39770: 'Adam Roarke', 1214921: 'Bob Johnson', 153336: 'Meg Wyllie', 90520: 'Susan Oliver', 164771: 'Peter Duryea', 1214891: 'Georgia Schmidt', 30551: 'Jeffrey Hunter', 1214907: 'Janos Prohaska', 78692: 'Laurel Goodwin', 1775091: 'Francine Pyne', 1214881: 'Vince Howard', 1759: 'Grace Lee Whitney', 15621: 'Bruce Watson', 1214880: 'Jeanne Bal', 1214877: 'Eddie Paskey', 15620: 'Alfred Ryder', 121759: 'Michael Zaslow', 1315468: 'Bill Bradley', 1555684: 'Ron Veto', 135066: 'Abraham Sofaer', 1745: 'Gene Roddenberry', 1214885: 'Charles J. Stewart', 15626: 'Robert Walker, Jr.', 9805: 'Sally Kellerman', 1214884: 'Robert H. Justman', 1214883: 'Andrea Dromm', 127601: 'Lloyd Haynes', 15628: 'Paul Carr', 8496: 'Paul Fix', 246: 'Gary Lockwood', 15635: 'Stewart Moss', 1214878: 'Bruce Hyde', 1766901: 'Edward Madden', 1214888: 'Jim Goodwin', 1769423: 'Jon Kowal', 1

In [11]:
#movies format is {'Movie': movie['original_title'], 'movie_id': movie['id'], 'guest_names': [name, name, name] , 'guest_ids': [id_num, id_num, id_num], 'count': 3}
movies = {}
count = 0
#movie_array = []
# loops through each guest star
for index, id_num in enumerate(guest_ids):
#if True: # use with id_num below to test the Rogers error
    print(count, '/', len(guest_ids)-1, end = ' \t')
    #id_num = 1213145 #Rogers' actual number is 132710
    url = f'https://api.themoviedb.org/3/person/{id_num}/movie_credits?api_key={api_key}&language=en-US'
    dat = requests.get(url).json()
    sleep(.3)
    #print(json.dumps(dat , indent = 2, sort_keys= True))
    try:
        # each movie for which current guest star was listed as a cast member
        for movie in dat['cast']:
            #print(json.dumps(dat['cast, indent = 2, sort_keys= True))
            if movie['id'] not in movies:
                movies[movie['id']] = {'Movie': movie['original_title'], 'movie_id': movie['id'], 'guest_names': [] , 'guest_ids': [], 'count': 0}
                
            movies[movie['id']]['guest_names'].append(id_to_name[id_num])
            movies[movie['id']]['guest_ids'].append(id_num)
            movies[movie['id']]['count'] += 1
    except KeyError:
        print(f'\nERROR - {url} \n {guest_ids.pop(index)}')       
        
        if id_num in id_to_name:
            del id_to_name[id_num]
        #Elizabeth Rogers and Peter Duryea errors are fine they each now exist under two ids, one of which is correct
    count += 1
    
#print(movies)

0 / 301 	1 / 301 	2 / 301 	3 / 301 	4 / 301 	5 / 301 	6 / 301 	7 / 301 	8 / 301 	9 / 301 	10 / 301 	11 / 301 	12 / 301 	13 / 301 	14 / 301 	15 / 301 	16 / 301 	17 / 301 	18 / 301 	19 / 301 	20 / 301 	21 / 301 	22 / 301 	23 / 301 	24 / 301 	25 / 301 	26 / 301 	27 / 301 	28 / 301 	29 / 301 	30 / 301 	31 / 301 	32 / 301 	33 / 301 	34 / 301 	35 / 301 	36 / 301 	37 / 301 	38 / 301 	39 / 301 	40 / 301 	41 / 301 	42 / 301 	43 / 301 	44 / 301 	45 / 301 	46 / 301 	47 / 301 	48 / 301 	49 / 301 	50 / 301 	51 / 301 	52 / 301 	53 / 301 	54 / 301 	55 / 301 	56 / 301 	57 / 301 	58 / 301 	59 / 301 	60 / 301 	61 / 301 	62 / 301 	63 / 301 	64 / 301 	65 / 301 	66 / 301 	67 / 301 	68 / 301 	69 / 301 	70 / 301 	71 / 301 	
ERROR - https://api.themoviedb.org/3/person/1214892/movie_credits?api_key=1bb6cfd646261b1acda61748ca2bb5a7&language=en-US 
 1214892
72 / 300 	73 / 300 	74 / 300 	75 / 300 	76 / 300 	77 / 300 	78 / 300 	79 / 300 	80 / 300 	81 / 300 	82 / 300 	83 / 300 	84 / 300 	85 / 300 	86 / 300 	87 / 30

In [12]:

df = pd.DataFrame(movies).T

df = df.set_index('Movie')
df = df.sort_values('count',ascending = False)
df.to_csv('first_returns.csv')
df

Unnamed: 0_level_0,count,guest_ids,guest_names,movie_id
Movie,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
The Greatest Story Ever Told,6,"[135066, 1820, 15693, 24826, 15942, 16074]","[Abraham Sofaer, Mark Lenard, John Crawford, J...",2428
Spartacus,6,"[14508, 15949, 7074, 12431, 14256, 1214919]","[John Hoyt, Vic Perrin, Peter Brocco, Carey Lo...",967
True Grit,5,"[15620, 15655, 5247, 161405, 9596]","[Alfred Ryder, Kim Darby, John Fiedler, Ron So...",17529
Batman,5,"[100799, 14256, 1214919, 291, 16108]","[George Sawaya, Dick Crockett, Gil Perkins, Fr...",2661
The Cage,5,"[14508, 90520, 164771, 30551, 78692]","[John Hoyt, Susan Oliver, Peter Duryea, Jeffre...",433524
Emperor of the North Pole,5,"[1214888, 3339, 5695, 15781, 14069]","[Jim Goodwin, Elisha Cook Jr., Sid Haig, Hal B...",31943
Scorpio,5,"[4076, 15765, 15954, 16030, 16051]","[Morgan Farley, John Colicos, Celeste Yarnall,...",42738
To Kill a Mockingbird,5,"[8496, 8490, 6838, 8499, 16055]","[Paul Fix, John Megna, Frank Overton, William ...",595
Brute Corps,5,"[15628, 15719, 126903, 15993, 12298]","[Paul Carr, Charles Macaulay, Joseph Bernard, ...",102178
Star Trek: Of Gods And Men,5,"[1759, 1213151, 152660, 15954, 16052]","[Grace Lee Whitney, Lawrence Montaigne, Arlene...",18231


In [13]:
df = pd.read_csv('first_returns.csv')

In [14]:
#which two star trek actors worked with each other the most
series = df['guest_ids']
#pairing_dict_names is formatted {star_id_num:{costar_id_num: num_of_pairings, costar_id_num: num_of_pairings], next_star_id_num:{costar_id_num: num_of_pairings, costar_id_num: num_of_pairings]}
#pairing_dict_names is formatted {star:{costar: num_of_pairings, costar: num_of_pairings], next_star:{costar: num_of_pairings, costar: num_of_pairings]}
#pairing_dict_titles is formatted {star:{costar: [title, title, title], costar: [title, title], next_star:{costar: [title, title] , costar: [title, title, title]}

pairing_dict = {}
pairing_dict_names = {}
pairing_dict_titles = {}
for movie in movies:
    id_nums = movies[movie]['guest_ids']
    title = movies[movie]['Movie']
    for id_num in id_nums: #human id_nums
        #print(id_num)
        if id_num not in pairing_dict:
            pairing_dict[id_num] = {}
            pairing_dict_names[id_to_name[id_num]] = {}
            pairing_dict_titles[id_to_name[id_num]] = {}
        for other_id_num in id_nums:
            if other_id_num not in pairing_dict[id_num] and other_id_num != id_num:
                pairing_dict[id_num][other_id_num] = 0
                pairing_dict_names[id_to_name[id_num]][id_to_name[other_id_num]] = 0
                pairing_dict_titles[id_to_name[id_num]][id_to_name[other_id_num]] = []
            if other_id_num != id_num:
                pairing_dict[id_num][other_id_num] += 1
                pairing_dict_names[id_to_name[id_num]][id_to_name[other_id_num]] += 1
                pairing_dict_titles[id_to_name[id_num]][id_to_name[other_id_num]].append(title)
                    
print(pairing_dict_titles)

{'Joseph Mell': {'Kathie Browne': ['City of Fear', 'Murder by Contract'], 'Phillip Pine': ['Murder by Contract'], 'Charles Drake': ['Back Street'], 'John Crawford': ['The Big Heat'], 'Celia Lovsky': ['The Big Heat'], 'Sarah Marshall': ['Lord Love a Duck'], 'Whit Bissell': ['I Was a Teenage Werewolf']}, 'Kathie Browne': {'Joseph Mell': ['City of Fear', 'Murder by Contract'], 'Phillip Pine': ['Murder by Contract', 'Brainstorm'], 'Jeffrey Hunter': ['Brainstorm']}, 'Phillip Pine': {'Joseph Mell': ['Murder by Contract'], 'Kathie Browne': ['Murder by Contract', 'Brainstorm'], 'Jeffrey Hunter': ['Brainstorm'], 'Michael Strong': ['Dead Heat on a Merry-Go-Round'], 'Mark Lenard': ['Outrage'], 'Jason Wingreen': ['Outrage'], 'Hal Baylor': ['The Set-Up'], 'Charles Drake': ['The Price of Fear'], 'Warren Stevens': ['The Price of Fear'], 'William Schallert': ['Hoodlum Empire'], 'Whit Bissell': ['Hoodlum Empire'], 'Lee Delano': ['Project X'], 'Keye Luke': ['Project X'], 'Arthur Batanides': ['The Cat At

In [15]:
#absolutely useless, must be a better format
pd.DataFrame(pairing_dict_names)

Unnamed: 0,Abraham Sofaer,Adam Roarke,Alfred Ryder,Angelique Pettyjohn,Anna Karen,Anthony Caruso,Arlene Martel,Arnold Moss,Arthur Batanides,BarBara Luna,...,Willard Sage,William Campbell,William Marshall,William O'Connell,William Sargent,William Schallert,William Smithers,William Windom,William Wintersole,Yvonne Craig
Abraham Sofaer,,,,,,,,,,1.0,...,,,,,1.0,,,,,
Adam Roarke,,,,1.0,,,,,,,...,,,,,,,,,,
Alfred Ryder,,,,,,,,,,,...,,,,,,,,,,
Angelique Pettyjohn,,1.0,,,,,,,,,...,,,,,,,,,,
Anthony Caruso,,,,,,,,,,,...,,,,,,,,,,
Arlene Martel,,,,,,,,,,,...,,,,,,,,,,
Arnold Moss,,,,,,,,,,,...,,,,,,,,,,
Arthur Batanides,,,,,,,,,,1.0,...,,,,,,,,,,
BarBara Luna,1.0,,,,,,,,1.0,,...,,,,,,,,,,
Barbara Anderson,,,,,,,,,,,...,,,,,,,,,,


In [16]:
#Generating memory-alpha entries
#should really include all cast, not just guest stars
#pairing_dict_names is formatted {star:{costar: pairings, costar: pairings], next_star:{costar: pairings, costar: pairings]}
#pairing_dict_titles is formatted {star:{costar: [title, title, title], costar: [title, title], next_star:{costar: [title, title] , costar: [title, title, title]}
for star in pairing_dict_names:
    flag = False
    also = ''
    for costar in pairing_dict_names[star]:
        #pairings is number of movies in common
        pairings = pairing_dict_names[star][costar]
        
        if pairings > 2:
            print(f'http://memory-alpha.wikia.com/wiki/{star}?veaction=edit'.replace(' ','_'))
            print(f'{star} {also}worked with fellow Star Trek guest star {costar} in {pairings} different movies: ', end = '')
            num = len(pairing_dict_titles[star][costar]) - 1
            
            for index, title in enumerate(pairing_dict_titles[star][costar]):
                if index == num:
                    print('and ', end = '')
                print( title, end = '')
                if index != num:
                    print(', ', end = '')
                else:
                    print('.')
                also = 'also '

http://memory-alpha.wikia.com/wiki/John_Crawford?veaction=edit
John Crawford worked with fellow Star Trek guest star Michael Ansara in 3 different movies: The Greatest Story Ever Told, Serpent of the Nile, and Slaves of Babylon.
http://memory-alpha.wikia.com/wiki/Whit_Bissell?veaction=edit
Whit Bissell worked with fellow Star Trek guest star John Hoyt in 4 different movies: Brute Force, Lost Continent, Trial, and Never So Few.
http://memory-alpha.wikia.com/wiki/Whit_Bissell?veaction=edit
Whit Bissell also worked with fellow Star Trek guest star Jeff Corey in 4 different movies: Brute Force, Canon City, Red Mountain, and Somewhere in the Night.
http://memory-alpha.wikia.com/wiki/Whit_Bissell?veaction=edit
Whit Bissell also worked with fellow Star Trek guest star Roy Jenson in 3 different movies: Soylent Green, 5 Card Stud, and The Caine Mutiny.
http://memory-alpha.wikia.com/wiki/Whit_Bissell?veaction=edit
Whit Bissell also worked with fellow Star Trek guest star Anthony Caruso in 3 diff

In [17]:
#compare which three star trek actors worked with each other the most

In [18]:
#which star trek actor worked with the most other star trek actors