<a href="https://colab.research.google.com/github/hammad93/hurricane-tts/blob/main/hurricane_tts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installation
Please make sure to run all tests.

In [None]:
!git clone https://github.com/hammad93/hurricane-tts.git

In [17]:
!git pull

Already up to date.


In [None]:
!pip install -r hurricane-tts/requirements.txt

In [2]:
import os
from google.colab import userdata
if not os.getenv("AZURE_OPENAI_API_KEY") :
  os.environ["AZURE_OPENAI_API_KEY"] = userdata.get('AZURE_OPENAI_API_KEY')

In [3]:
%cd hurricane-tts
!python test.py

[Errno 2] No such file or directory: 'hurricane-tts'
/content/hurricane-tts
Here are the constructed messages: [{'role': 'system', 'content': 'You are an AI assistant that helps people find information.'}, {'role': 'user', 'content': 'test'}]
..
----------------------------------------------------------------------
Ran 2 tests in 4.432s

OK


In [13]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [4]:
import prompts
import utils
# we generate prompts by ingesting live hurricane data and supported languages
storm_data = utils.transform_storm_data()
prompt_data = prompts.generate_prompts()
supported_langs = prompts.unique_lang_list()

In [5]:
# example
print(prompt_data['storms'][0])

These are the input storm records and forecasts.
Each record has a lat and lon according to their geographic coordinates and wind speed in knots.
Please respond with the 3 most relevant languages exactly as they appear from the supported language list other than English each delimited by a comma.
[{'type': 'history', 'lat': 4.8, 'lon': 152.0, 'time': '2023-11-08 18:00:00', 'wind_speed': 15}, {'type': 'history', 'lat': 5.1, 'lon': 151.4, 'time': '2023-11-09 00:00:00', 'wind_speed': 15}, {'type': 'history', 'lat': 5.3, 'lon': 150.8, 'time': '2023-11-09 06:00:00', 'wind_speed': 15}, {'type': 'forecast', 'lat': 5.5, 'lon': 151.0, 'time': '2023-11-09T18:00:00', 'wind_speed': 20}, {'type': 'forecast', 'lat': 5.9, 'lon': 151.5, 'time': '2023-11-10T06:00:00', 'wind_speed': 25}, {'type': 'forecast', 'lat': 6.4, 'lon': 152.0, 'time': '2023-11-10T18:00:00', 'wind_speed': 30}, {'type': 'forecast', 'lat': 6.7, 'lon': 152.5, 'time': '2023-11-11T06:00:00', 'wind_speed': 35}, {'type': 'forecast', 'lat

# Language Geography inference from tropical storm geographical coorindates
We're trying to answer the question, "Which languages do we produce speech for to report on this tropical storm?" This pipeline will produce an English output but also utilize massively multilingual capabilities such that emergency notices are also in local languages

In [22]:
retries = 5 # sometimes it fails, so we retry it
storm_langs = {} # keys are storm id's and values are the languages
storm_chats = {} # stores chat histories for storms
for index, storm_prompt in enumerate(prompt_data['storms']) :
  storm_id = list(storm_data.keys())[index]
  # Get the languages to generate the report for this storm
  while retries > 0 :
    response = prompts.chat(system=prompt_data['system'], message=storm_prompt)
    print(response)
    result = utils.llm_response_transform(
        resp=response, supported_langs=supported_langs)
    print(result)
    if result :
      break
    else :
      retries = retries - 1
      print(f"Failed. Retries left: {retries}")
  if retries < 1 :
    raise Exception("Couldn't produce a correct output from LLM.")

  # store results
  storm_langs[storm_id] = {'names': ['English'] + result}
  storm_chats[storm_id] = {
      'history' : [{'role': 'system', 'content': prompt_data['system']},
                   {'role': 'user', 'content': storm_prompt},
                   {'role': 'assistant', 'content': response}]
  }

Here are the constructed messages: [{'role': 'system', 'content': "You are an expert in languages according to their geographical location.\nI have a system for emergency notification of tropical storms that utilizes official data sources and creates speech audio from text with a massively multilingual model.\nThis is the list of supported languages,\n{'Chorote, Iyojwaâ\\x80\\x99ja', 'Chin, Zyphe', 'MÃ©nik', 'Nagamese', 'Benga', 'TsimanÃ©', 'Lahu', 'Quechua, North JunÃ\\xadn', 'Lango', 'Luang', 'Yala', 'Mengen', 'Lama', 'Ifugao, Batad', 'Shuar', 'Pele-Ata', 'Tsikimba', 'Paranan', 'Romani, Vlax', 'Mixtec, Chayuco', 'Iyo', 'Krumen, Tepo', 'Tok Pisin', 'Malagasy', 'Naga, Tangshang', 'Russian', 'Quechua, Panao', 'Bomu', 'Wanano', 'Burmese', 'Cuiba', 'Chhattisgarhi', 'Moru', 'Puinave', 'Tangoa', 'Abidji', 'Toba', 'Bambam', 'Tacana', 'Kyrgyz', 'Macushi', 'Tampulma', 'Anyin', 'Gwere', 'Mampruli', 'Toraja-Saâ\\x80\\x99dan', 'Komi-Zyrian', 'PaumarÃ\\xad', 'Ndogo', 'Garifuna', 'Mbandja', 'Kambaa

# Tropical Storm Report
This code will create the report in the language specified based on the tropical storm.

In [23]:
with open('prompts/report-prompt.txt', 'r') as file:
  report_prompt = file.read()
for storm in storm_langs:
  reports = []
  for lang in storm_langs[storm]['names']:
    print(storm_chats[storm]['history'])
    # Construct the prompt
    message = report_prompt.format(lang=lang)
    print(message)
    # Send to LLM
    response = prompts.chat(
        message = message, history = storm_chats[storm]['history'])
    print(response)
    # store data
    reports.append(response)
  storm_langs[storm]['reports'] = reports

[{'role': 'system', 'content': "You are an expert in languages according to their geographical location.\nI have a system for emergency notification of tropical storms that utilizes official data sources and creates speech audio from text with a massively multilingual model.\nThis is the list of supported languages,\n{'Chorote, Iyojwaâ\\x80\\x99ja', 'Chin, Zyphe', 'MÃ©nik', 'Nagamese', 'Benga', 'TsimanÃ©', 'Lahu', 'Quechua, North JunÃ\\xadn', 'Lango', 'Luang', 'Yala', 'Mengen', 'Lama', 'Ifugao, Batad', 'Shuar', 'Pele-Ata', 'Tsikimba', 'Paranan', 'Romani, Vlax', 'Mixtec, Chayuco', 'Iyo', 'Krumen, Tepo', 'Tok Pisin', 'Malagasy', 'Naga, Tangshang', 'Russian', 'Quechua, Panao', 'Bomu', 'Wanano', 'Burmese', 'Cuiba', 'Chhattisgarhi', 'Moru', 'Puinave', 'Tangoa', 'Abidji', 'Toba', 'Bambam', 'Tacana', 'Kyrgyz', 'Macushi', 'Tampulma', 'Anyin', 'Gwere', 'Mampruli', 'Toraja-Saâ\\x80\\x99dan', 'Komi-Zyrian', 'PaumarÃ\\xad', 'Ndogo', 'Garifuna', 'Mbandja', 'Kambaata', 'AwajÃºn', 'Kakataibo-Kashibo'

In [24]:
storm_langs

{'WP952023': {'names': ['English', 'Malagasy', 'Tagalog', 'Indonesian'],
  'reports': ['Attention residents, a tropical storm is expected to occur on the following dates and locations: On November 9th, at 6:00 AM, it is projected to be at 5.3 latitude and 150.8 longitude with a wind speed of 15 knots. On November 9th, at 6:00 PM, it is predicted to be at 5.5 latitude and 151.0 longitude with a wind speed of 20 knots. On November 10th, at 6:00 AM, it is expected to be at 5.9 latitude and 151.5 longitude with a wind speed of 25 knots. On November 10th, at 6:00 PM, it will be at 6.4 latitude and 152.0 longitude with a wind speed of 30 knots. On November 11th, at 6:00 AM, it is forecasted to be at 6.7 latitude and 152.5 longitude with a wind speed of 35 knots. On November 12th, at 6:00 AM, it will be at 7.2 latitude and 153.5 longitude with a wind speed of 40 knots. Please stay updated on the situation and follow all safety protocols.',
   "Tandremo amin'ny trangaova nto sy ny fanadinana a

The following are from the TTS implementation
https://github.com/facebookresearch/fairseq/blob/main/examples/mms/tts/tutorial/MMS_TTS_Inference_Colab.ipynb


In [None]:
%cd ~/
!git clone https://github.com/jaywalnut310/vits.git
%cd vits/

!pip install Cython==0.29.21
!pip install librosa==0.8.0
!pip install phonemizer==2.2.1
!pip install scipy
!pip install numpy
!pip install torch
!pip install torchvision
!pip install matplotlib
!pip install Unidecode==1.1.1

%cd monotonic_align/
%mkdir monotonic_align
!python3 setup.py build_ext --inplace
%cd ../
%pwd