# Guidelines for Project 1

This document contains guidelines, requirements, and suggestions for Project 1.

## Team Effort

Before anything, remember that Projects are a **group effort**: Working closely with your teammates is a requirement. This both helps teach real-world collaborative workflows, and enables you to tackle more difficult problems than you'd be able to working alone. 

In other words, working in groups allows you to **work smart** and **dream big**. Take advantage of it!

## Project Proposal

Before you start writing any code, your group should outline the scope and purpose of your project. This helps provide direction and prevent [scope creep](https://en.wikipedia.org/wiki/Scope_creep).

Write this as a brief summary of your interests and intent, including:

* The kind of data you'd like to work with/field you're interested in (e.g., geodata, weather data, etc.)

* The kinds of questions you'll be asking of that data

* Possible source for such data

In other words, write down what kind of data you plan to work with, and what kinds of questions you'd like to ask of it. This constitutes your Project Proposal/Outline, and should look something like this:

> Our project is to uncover patterns in criminal activity around Los Angeles. We'll examine relationships between types of crime and location; crime rates and times of day; trends in crime rates over the course of the year; and related questions, as the data admits.

## Finding Data

Once your group has written an outline, it's time to start hunting for data. You are free to use data from any source, but we recommend the following curated sources of high-quality data:

* [data.world](https://data.world/)

* [Kaggle](https://www.kaggle.com/)

* [Data.gov](https://www.data.gov)

Chances are you'll have to update your Project Outline as you explore the available data. **This is fine**&mdash;adjustments like this are part of the process! Just make sure everyone in the group is up-to-speed on the goals of the project as you make changes.

## Data Cleanup &amp; Analysis

With data in hand, it's time to tackle development and analysis. This is where the fun starts!

Inevitably, the analysis process can be broken into two broad phases: **Exploration &amp; Cleanup** and **Analysis** proper.

As you've learned, you'll need to explore, clean, and reformat your data before you can begin to answer your research questions. We recommend keeping track of these exploration and cleanup steps in a dedicated Jupyter Notebook, both for organization's sake and to make it easier to  present your work later.

Similarly, after you've massaged your data and are ready to start crunching numbers, you should keep track of your work in a Jupyter Notebook dedicated specifically to analysis.

During both phases, **don't forget to include plots**! Don't make the mistake of waiting to build figures until you're preparing your presentation. Creating them along the way can reveal insights and interesting trends in the data that you might not notice otherwise.

Finally, be sure that your projects meet the [technical requirements](TechnicalRequirements.md).

## Presentation

After you've analyzed your data to your satisfaction, you'll put together a presentation to show of your work, explain your process, and discuss your conclusions.

This presentation will be delivered as a slideshow, and should give your classmates and instructional staff an overview of your work. PowerPoint, Keynote, and Google Slides are all acceptable for building slides. 

As long as your slides meet the [presentation requirements](PresentationRequirements.md), you are free to structure the presentation however you wish, but students are often successful with the format laid out in the [presentation guidelines](PresentationGuidelines.md).

## Sample Ideas &amp; Inspiration



- - - 

### Copyright

Coding Boot Camp &copy; 2017. All Rights Reserved.


In [32]:

import json
import requests
from time import sleep
import pandas as pd

api_key = '1bb6cfd646261b1acda61748ca2bb5a7'



In [4]:

url = 'http://api.themoviedb.org/3'
search_type = '/search/tv'
query = "star%20trek"
dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()


star_trek_id = dat['results'][0]['id']

In [5]:
star_trek_id

253

In [39]:
search_type = f'/tv/{star_trek_id}'

dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()

In [40]:
seasons = []
for season in dat['seasons']:
    seasons.append({'season_num': season['season_number'], 'episode_count': season['episode_count']})
print(seasons[1])

{'season_num': 1, 'episode_count': 29}


In [43]:
search_type = f'/tv/{star_trek_id}/credits'

dat = requests.get(url + search_type + "?page=1&query=" + query + "&api_key=" + api_key).json()

main_cast = []
for character in dat['cast']:
    main_cast.append(character['name'])
# 'Majel Barrett', 'Bill Blackburn','Frank da Vinci' not really main cast?
main_cast


['William Shatner',
 'Leonard Nimoy',
 'DeForest Kelley',
 'Majel Barrett',
 'Walter Koenig',
 'George Takei',
 'Nichelle Nichols',
 'James Doohan',
 'Majel Barrett',
 'Bill Blackburn',
 'Frank da Vinci']

In [93]:
guest_list = []
guest_ids = []
id_to_name = {}
#season_num = 1
#episode_num = 1
for season in seasons:
    season_num = season['season_num']
    episode_count = season['episode_count']
    for episode_num in range(1,episode_count):
        print(season_num, ':' , episode_num, end = '/t')
        sleep(.3)
        dat = requests.get(f'https://api.themoviedb.org/3/tv/253/season/{season_num}/episode/{episode_num}?api_key=1bb6cfd646261b1acda61748ca2bb5a7&language=en-US').json()
        try:
            for star in dat['guest_stars']:
                if star['name'] not in guest_list and star['name'] not in main_cast:
                    guest_list.append(star['name'])
                    guest_ids.append(star['id'])
                    id_to_name[star['id']] = star['name']
                    
        except KeyError:
            print('uh-oh', season_num, ':' , episode_num)
print(guest_ids)

[83913, 168061, 82510, 14508, 1213153, 101742, 39770, 1214921, 153336, 90520, 164771, 1214891, 30551, 1214907, 78692, 1775091, 1214881, 1759, 15621, 1214880, 1214877, 15620, 121759, 1315468, 1555684, 135066, 1745, 1214885, 15626, 9805, 1214884, 1214883, 127601, 15628, 8496, 246, 15635, 1214878, 1766901, 1214888, 1769423, 12424, 1214887, 15645, 15644, 122004, 1461992, 1402879, 9600, 15650, 119245, 1212973, 1776386, 8490, 8747, 6451, 15655, 13869, 1729741, 14732, 15659, 161673, 157282, 1776894, 1556119, 1212769, 15949, 50974, 15661, 27126, 1214900, 936121, 161340, 161255, 15676, 1768121, 129676, 153675, 1820, 1213151, 1769185, 161663, 1214896, 196562, 12295, 1611570, 1769186, 1468833, 15693, 15692, 1720860, 178296, 15746, 1214902, 15699, 1768125, 139933, 1497387, 153501, 1214565, 84325, 15707, 1769187, 15714, 88451, 10161, 3339, 166047, 15719, 1472601, 19107, 4076, 15721, 10925, 1214904, 5695, 1768201, 1214905, 16659, 127646, 1793, 22048, 15724, 153458, 15744, 6838, 15753, 151491, 41719,

In [94]:
guest_list
id_to_name

{246: 'Gary Lockwood',
 291: 'Frank Gorshin',
 1209: 'Elinor Donahue',
 1436: 'Melvin Belli',
 1745: 'Gene Roddenberry',
 1759: 'Grace Lee Whitney',
 1793: 'Ricardo Montalban',
 1820: 'Mark Lenard',
 1947: 'Stanley Adams',
 2021: 'Jane Wyatt',
 2652: 'Ken Lynch',
 2782: 'Ian Wolfe',
 3339: 'Elisha Cook Jr.',
 4076: 'Morgan Farley',
 5247: 'John Fiedler',
 5695: 'Sid Haig',
 6451: 'Michael J. Pollard',
 6838: 'Frank Overton',
 7074: 'Peter Brocco',
 8490: 'John Megna',
 8496: 'Paul Fix',
 8499: 'William Windom',
 8747: 'Stephen McEveety',
 9596: 'Jeff Corey',
 9600: 'Ted Cassidy',
 9805: 'Sally Kellerman',
 9811: 'Fred Williamson',
 10161: 'Richard Webb',
 10925: 'Torin Thatcher',
 12295: 'Perry Lopez',
 12298: 'Roy Jenson',
 12310: 'Warren Stevens',
 12424: 'Gene Dynarski',
 12431: 'Carey Loftin',
 13259: 'James Daly',
 13637: 'Joan Collins',
 13786: 'Whit Bissell',
 13869: 'Morgan Woodward',
 13871: 'Lou Antonio',
 14069: 'Vic Tayback',
 14256: 'Dick Crockett',
 14508: 'John Hoyt',
 1

In [106]:
movies = {}
count = 0
for id_num in guest_ids:
    #id_num = 83913
    #index = 0
    url = f'https://api.themoviedb.org/3/person/{id_num}/movie_credits?api_key=1bb6cfd646261b1acda61748ca2bb5a7&language=en-US'
    dat = requests.get(url).json()
    sleep(.3)
    try:
        for movie in dat['cast']:
            #print(json.dumps(dat, indent = 2, sort_keys= True))
            #print(dat['cast'])
            if movie['id'] not in movies:
                movies[movie['id']] = {'Movie': movie['original_title'], 'movie_id': movie['id'], 'guest_names': [] , 'guest_ids': [], 'count': 0}
                
            movies[movie['id']]['guest_names'].append(id_to_name[id_num])
            movies[movie['id']]['guest_ids'].append(id_num)
            movies[movie['id']]['count'] += 1
    except KeyError:
        print('ERROR', url)
        #so far, produces one error for an actor who appears in no movies
    count += 1
    print(count, '/', len(guest_ids))
print(movies)

1 / 297
2 / 297
3 / 297
4 / 297
5 / 297
6 / 297
7 / 297
8 / 297
9 / 297
10 / 297
11 / 297
12 / 297
13 / 297
14 / 297
15 / 297
16 / 297
17 / 297
18 / 297
19 / 297
20 / 297
21 / 297
22 / 297
23 / 297
24 / 297
25 / 297
26 / 297
27 / 297
28 / 297
29 / 297
30 / 297
31 / 297
32 / 297
33 / 297
34 / 297
35 / 297
36 / 297
37 / 297
38 / 297
39 / 297
40 / 297
41 / 297
42 / 297
43 / 297
44 / 297
45 / 297
46 / 297
47 / 297
48 / 297
49 / 297
50 / 297
51 / 297
52 / 297
53 / 297
54 / 297
55 / 297
56 / 297
57 / 297
58 / 297
59 / 297
60 / 297
61 / 297
62 / 297
63 / 297
64 / 297
65 / 297
66 / 297
67 / 297
68 / 297
69 / 297
70 / 297
71 / 297
72 / 297
73 / 297
74 / 297
75 / 297
76 / 297
77 / 297
78 / 297
79 / 297
80 / 297
81 / 297
82 / 297
83 / 297
84 / 297
85 / 297
86 / 297
87 / 297
88 / 297
89 / 297
90 / 297
91 / 297
92 / 297
93 / 297
94 / 297
95 / 297
96 / 297
97 / 297
98 / 297
99 / 297
100 / 297
101 / 297
102 / 297
103 / 297
104 / 297
105 / 297
106 / 297
107 / 297
108 / 297
109 / 297
110 / 297
111 / 29

In [111]:
df = pd.DataFrame(movies).T

df = df.set_index('Movie')
df = df.sort_values('count',ascending = False)
df.to_csv('first_returns.csv')
df

Unnamed: 0_level_0,count,guest_ids,guest_names,movie_id
Movie,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
The Greatest Story Ever Told,6,"[135066, 1820, 15693, 24826, 15942, 16074]","[Abraham Sofaer, Mark Lenard, John Crawford, J...",2428
Spartacus,6,"[14508, 15949, 7074, 12431, 14256, 1214919]","[John Hoyt, Vic Perrin, Peter Brocco, Carey Lo...",967
Batman,5,"[100799, 14256, 1214919, 291, 16108]","[George Sawaya, Dick Crockett, Gil Perkins, Fr...",2661
Julius Caesar,5,"[14508, 4076, 2782, 16055, 16074]","[John Hoyt, Morgan Farley, Ian Wolfe, Richard ...",18019
Brute Corps,5,"[15628, 15719, 126903, 15993, 12298]","[Paul Carr, Charles Macaulay, Joseph Bernard, ...",102178
Emperor of the North Pole,5,"[1214888, 3339, 5695, 15781, 14069]","[Jim Goodwin, Elisha Cook Jr., Sid Haig, Hal B...",31943
Star Trek: Of Gods And Men,5,"[1759, 1213151, 152660, 15954, 16052]","[Grace Lee Whitney, Lawrence Montaigne, Arlene...",18231
True Grit,5,"[15620, 15655, 5247, 161405, 9596]","[Alfred Ryder, Kim Darby, John Fiedler, Ron So...",17529
The Cage,5,"[14508, 90520, 164771, 30551, 78692]","[John Hoyt, Susan Oliver, Peter Duryea, Jeffre...",433524
Scorpio,5,"[4076, 15765, 15954, 16030, 16051]","[Morgan Farley, John Colicos, Celeste Yarnall,...",42738


In [None]:
#which two star trek actors worked with each other the most

In [None]:
#compare which three star trek actors worked with each other the most

In [None]:
#which star trek actor worked with the most other star trek actors