# Launch Fleet

```
https://github.com/hudsonmendes/lambda-tmdb-distributed-downloader
MENDES, Hudson
14th May, 2020
London, UK
```

## Summary

The `lambda TMDB distributed downloader` is an **[AWS Lambda Function](https://aws.amazon.com/lambda/)** hooked to an **[AWS SQS Queue](https://aws.amazon.com/sqs/)**.

In order to launch our download fleet, we must then send messages to **SQS** and let our lambda function find them and start the download process.

This notebook reads the **[IMDB Titles Dataset]** in order to determine the download partitions (based in `year` and `initial` letter of the title), and sends messages for
each of those partitions.

## Environment

### Dependencies

In [None]:
%%bash
pip install -U pandas
pip install -r requirements.txt

### Requirements

Before you start, ensure that you have the following components ready to be used:
1. AWS SQS Queue
2. AWS Lambda Function Created
3. AWS Lambda Function Deployed

If you have not yet prepared the environment, please uncomment and run the following cell:

In [None]:
%%bash
python . infra
python . deploy

### Imports

In [None]:
import boto3
import json
import pandas as pd
from tdd import IMDbMovie

### Storage

In [None]:
IMDB_MOVIES_PATH = 'https://datasets.imdbws.com/title.basics.tsv.gz'

### AWS SQS

In [None]:
sqs = boto3.resource('sqs')
queue = sqs.get_queue_by_name(QueueName='hudsonmendes-imdb2tmdb-movies-download-queue')

## Data

### IMDB, Titles

In [None]:
df = pd.read_csv(IMDB_MOVIES_PATH, delimiter='\t', header=0)
df['initial'] = df['primaryTitle'].map(IMDbMovie.get_initial_from)

### Fleet Partitions

In [None]:
initials = sorted(set([ initial for initial in df['initial'] if initial ]))
years = sorted(set([ year for year in df['startYear'] if year ]))
partitions = zip(years, initials)

## Launching

In [None]:
for year, initial in partitions:
    print((year, initial))
    message = { 'year': year, 'initial': initial }
    body = json.dumps(message)
    queue.send_message(MessageBody=body)