# Picovoice Interview Questions
____

Given two strings, compute the minimum number of edits
needed to transform the first string into the second string. A single edit is an insertion,
deletion, or substitution of a single character.

In [9]:
def compute_edits(str1, str2):
    if len(str1) == 0:
        return len(str2)

    if len(str2) == 0:
        return len(str1)

    if str2[-1] == str1[-1]:
        return compute_edits(str1[:-1], str2[:-1])

    return 1 + min(compute_edits(str1, str2[:-1]),
                   compute_edits(str1[:-1], str2),
                   compute_edits(str1[:-1], str2[:-1]))

In [10]:
compute_edits("Hi", "Hil")

1

In [11]:
compute_edits("Take", "Jake")

1

In [12]:
compute_edits("fame", "famous")

3

Note that capital letters count as different characters

In [13]:
compute_edits("A", "a")

1

Given an input string and a pattern, implement regular
expression matching with support for `.` and `*`.
- `.` Matches any single character
- `*` Matches zero or more of the preceding element

_____

In [14]:
def match_regex(str, pattern):
    None

____

Implement (in Numpy) a unidirectional multi-layer LSTM classifier
with input and forget gates coupled. You can find information about this variant of
LSTM [here](https://arxiv.org/pdf/1503.04069.pdf?utm_content=buffereddc5&utm_medium=social&utm_source=plus.google.com&utm_campaign=buffer) (look for CIFG). The model should accept a feature vector as input and
emit the corresponding posterior. Then train a character-based language model to
generate text resembling Shakespear (use any online dataset you see fit).

___

 You are tasked to collect 1000 hours of “labeled English speech data” for training
purposes. The data is pairs of audio files and their corresponding transcripts. The
transcripts should be 99%+ accurate. How do you go about this? How fast can you
gather 100 hours? How about 10000 hours? Provide as much detail as possible.
HINT: TED Talks are freely available for download and are also hand transcripted.

By installing this following package,

`from youtube_transcript_api import YouTubeTranscriptApi`

We could first check if a video has transcripts put-up,

`transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, languages=['de', 'en'])`

Then automatically downloads the transcripts by running,

`transcript = transcript_list.find_transcript(['de', 'en'])`

...

Next up, we need to download the audio by using the package, 

`import youtube_dl`

With a given download format JSON, which is already specified to downloading audio only (in .mp3 format),

```
ydl_opts = {
    'format': 'bestaudio/best',
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'mp3',
        'preferredquality': '192',
    }],
}
```

We could easily download the audio by,

`with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    ydl.download(['video link'])`
    
...

Finally we just need to extract all the links of the videos of some recognized youtube channels for quality content, clear speech, wide range of topics and often sophisticated words. And for that we would look at education channels, including TED, TEDx Talks, etc. 

First we need the,

`import google-api-python-client`

Then, we could get the list of videos by a carefully specified JSON format. 

Or otherwise we could use the `urllib` api, by doing, for example,

`urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )`

to retrieve a list of youtube urls under a specific channels. 

Now that we definitely have more than 1000 hours of audio+texts, if that isn't enough, we could reach to audiobooks on youtube which usually are lengthed around 10 hours, including 'To Kill a Mockingbird' (12 hrs+), 'Lord of the Rings' (Book 1 12 hrs+), etc. But it raises another question. These audiobooks don't usually have subtitles with them. Though, there could be a lot of legal issues around this. 

Another solution is to use the related videos for rescue, and that could be done by,

`GET https://www.googleapis.com/youtube/v3/search?part=snippet&relatedToVideoId=____ &type=video&key={YOUR_API_KEY}`

Then we could check if that video has subtitles with it. If it does we do what we did above, otherwise we keep going through the related videos list until we have enough hours.

____