<a href="https://colab.research.google.com/github/grahamswanston/cap-comp215/blob/main/labs/lab2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

COMP 215 - LAB 2
----------------
#### Name: Graham Swanston
#### Date: January 20, 2022
This lab exercise is mostly a review of strings, tuples, lists, dictionaries, and functions.
We will also see how "list comprehension" provides a compact form for "list accumulator" algorithms.

As usual, the first code cell simply imports all the modules we'll be using...

In [38]:
import json, requests
import matplotlib.pyplot as plt
from pprint import pprint

We'll answer some questions about movies and TV shows with the IMDb database:  https://www.imdb.com/
> using the IMDb API:  https://imdb-api.com/api

You can register for your own API key, or simply use the one provided below.

Here's an example query:
 *   search for TV Series with title == "Lexx"

In [39]:
API_KEY = 'k_ynffhhna'

title = 'lexx'
url = "https://imdb-api.com/en/API/SearchTitle/{key}/{title}".format(key=API_KEY, title=title)

response = requests.request("GET", url, headers={}, data={})

data = json.loads(response.text)  # recall json.loads for lab 1

results = data['results']
pprint(results)

[{'description': '(1996) (TV Series)',
  'id': 'tt0115243',
  'image': 'https://imdb-api.com/images/original/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.7273_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexx'},
 {'description': '(2008) (Video)',
  'id': 'tt1833738',
  'image': 'https://imdb-api.com/images/original/MV5BMjAyMTYzNjk4NV5BMl5BanBnXkFtZTcwNzE4MTU0NA@@._V1_Ratio0.7273_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexx'},
 {'description': '(2018)',
  'id': 'tt10800568',
  'image': 'https://imdb-api.com/images/original/MV5BZWY5ODYwNzYtMmIyMS00YzhhLTg0OTAtODM1M2I5YzkxMzY1XkEyXkFqcGdeQXVyMTEwNDU1MzEy._V1_Ratio0.7273_AL_.jpg',
  'resultType': 'Title',
  'title': 'Lexxy Roxx: Lexy 360 - Der Film'},
 {'description': '(2014) (Short)',
  'id': 'tt4396272',
  'image': 'https://imdb-api.com/images/original/nopicture.jpg',
  'resultType': 'Title',
  'title': 'Lexxxus'},
 {'description': '(2018) (Short)',
  'id': 'tt12646262',
  'image': 

Next we extract the item we want from the data set by applying a "filter":

In [40]:
items = [item for item in results if item['title']=='Lexx' and "TV" in item['description']]
assert len(items) == 1
lexx = items[0]
pprint(lexx)

{'description': '(1996) (TV Series)',
 'id': 'tt0115243',
 'image': 'https://imdb-api.com/images/original/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.7273_AL_.jpg',
 'resultType': 'Title',
 'title': 'Lexx'}


## Exercise 1

In the code cell below, re-write the "list comprehension" above as a loop so you understand how it works.
Notice how the "conditional list comprehension" is a compact way to "filter" items of interest from a large data set.


In [41]:
items = []
for i in results:
    if i['title'] == 'Lexx' and "TV" in i['description']:
        items.append(i)
        assert len(items) == 1
pprint(items[0])


{'description': '(1996) (TV Series)',
 'id': 'tt0115243',
 'image': 'https://imdb-api.com/images/original/MV5BOGFjMzQyMTYtMjQxNy00NjAyLWI2OWMtZGVhMjk4OGM3ZjE5XkEyXkFqcGdeQXVyNzMzMjU5NDY@._V1_Ratio0.7273_AL_.jpg',
 'resultType': 'Title',
 'title': 'Lexx'}


Notice that the `lexx` dictionary contains an `id` field that uniquely identifies this record in the database.

We can use the `id` to fetch other information about the TV series, for example,
*   get names of all actors in the TV Series Lexx


In [91]:
url = "https://imdb-api.com/en/API/FullCast/{key}/{id}".format(key=API_KEY, id=lexx['id'])
response = requests.request("GET", url, headers={}, data={})
data = json.loads(response.text)

actors = data['actors']
pprint(actors[10:10])   # recall the slice operator (it's a long list!)

[]


Notice that the `asCharacter` field contains a number of different pieces of data as a single string, including the character name.
This kind of "free-form" text data is notoriously challenging to parse...

## Exercise 2

In the code cell below, write a python function that takes a string input (the text from `asCharacter` field)
and returns the number of episodes, if available, or None.

Hints:
* notice this is a numeric value followed by the word "episodes"
* recall str.split() and str.isdigit() and other string build-ins.

Add unit tests to cover as many cases from the `actors` data set above as you can.


In [93]:
def episode_count(asCharString):
  
  str_parse = asCharString['asCharacter'].split(' ', -3)
  episodes = str_parse[-3]
  if episodes.isdigit() == False:
    return 'NONE'
  return episodes
  

for i in actors[:10]:
  print("This actor was in", episode_count(i), 'episodes.')


This actor was in 61 episodes.
This actor was in 61 episodes.
This actor was in 57 episodes.
This actor was in 55 episodes.
This actor was in 46 episodes.
This actor was in 23 episodes.
This actor was in 16 episodes.
This actor was in 8 episodes.
This actor was in 13 episodes.
This actor was in 10 episodes.


## Exercise 3

In the code cell below, write a python function that takes a string input (the text from `asCharacter` field)
and returns just the character name.  This one may be even a little harder!

Hints:
* notice the character name is usually followed by a forward slash, `/`
* don't worry if your algorithm does not perfectly parse every character's name --
it may not really be possible to correclty handle all cases because the field format does not follow consistent rules

Add unit tests to cover as many cases from the `actors` data set above as you can.


In [92]:
def character_name(asCharString):
  
  str_parse = asCharString['asCharacter'].split('/')
  if len(str_parse) == 1:
    reformat = str_parse[0].split() 
    i = 0
    name = []
    for i in range(len(reformat)):
      
      if reformat[i].isdigit() == True:
        break
      name.append(reformat[i])
      i = i +1
    
    return ' '.join(name)
  else:    
    name = str_parse[0]
  return name

for i in actors[10::10]:
  print(character_name(i))

Divine Predecessor 
Holo Cleric 
Holo Judge 
Blue Team Member 
E.J. Moss, Commander of Eagle 5 
Lomea 
Road Worker 
Computer 
Black Pawn #3 
Master of Ceremonies
Brother Treygor
Groo
Thodin
Kyoo
Older Woman
Gibble
Dale
Dream Girl #1
Journalist #2
Rockhound
Professor Shnoog
Middle Aged Son
Guard #3
Nurse
Older Husband
Older Wife
Gypsy Servant
Lace
Survivalist
Frankie
Businessman
Hank
Guard #1
Cmdr. Bricklin
Computer




## Exercise 4

Using the functions you developed above, define 2 list comprehensions that:
* create list of 2 tuples with (actor name, character description) for actors in Lexx  (from `asCharacter` field)
* create a list of dictionaries, with keys:  'actor' and 'character' for the same data

Hint: this is a very simple problem - the goal is to learn how to build these lists using a comprehension.

Pretty print (pprint) your lists to visually verify the results.

In [45]:
# your code here