# NLP Final Project


This project was carried out during one week in March 2023, in the data science master's course at the Assembler Institute of Technology.<p>
    
Team: Lien Chin, Helen Navarro, Sergio Salvador, Francisco Ávila.

## Task

The company sky2travel hires us to carry out a transactional bot, for which they ask us to carry out an initial demo of the project to check the viability of the requirements. Sky2travel is a company focused on the search for flights and travel, its large customer base is between 18 and 40 years old, and makes intensive use of mobile phones, which means that 70% of its sales are made from these devices. <p>
They have detected the need of their customers to search and buy flights from their mobile phones in a faster and more convenient way, so they are looking to integrate their bot with whatsapp, telegram and other platforms such as chat on the web, facebook messenger and similar. <p>
The aim is that customers, with a simple text message like "Bills from Madrid to London in August for 3 days" or "Cheap tickets to Berlin with Lufthansa", the application will be able to process the customer's request and send them the information easily to their mobile phone along with a direct link to payment.<p>
For the demo, we are asked that the script can generate the requests to be sent to the Amadeus booking software in JSON format. It will be enough to create a notebook in ipynb format to demo the functionality.


## Import Libraries

In [1]:
from nltk.chat.util import Chat, reflections
import nltk
from nltk import UnigramTagger, BigramTagger, TrigramTagger
from nltk.tag.hmm import HiddenMarkovModelTagger
from nltk.chunk.regexp import *
from nltk.corpus import cess_esp
from nltk.tokenize import word_tokenize
from nltk.chunk.regexp import *
from assembler import pln
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")

## Create the corpus

We made a corpus with 26 senteces related with our topic 'Booking flights'

In [2]:
corpus= [
    
    'Quiero 2 billetes de Madrid a Frankfurt en Septiembre',
    'Necesito comprar un billete a Madrid el 5 de Agosto',
    'Comprar billete Barcelona a Roma para el 25 de Agosto con Iberia',
    'Billete barato AirEuropa de Madrid a Sevilla', 
    'Necesito un billete de Tenerife a Jerez',
    'Quiero un vuelo para París el 15 de julio',
    'Volar desde Florencia a Barcelona',
    'Necesito comprar un billete a Quito el 2 de Noviembre',
    'Billete económico Iberia de Madrid a México',
    'Necesito un vuelo de ida y vuelta para el 20 de diciembre a Madrid desde Jerez'
    'Quiero reservar un vuelo para tres personas desde Londres a Madrid el 1 de julio',
    '2 vuelos a Paris el 2 de Abril con Easyjet',
    'Vuelo economico a Berlín', 
    '5 billetes de avion a Canadá con Emirates',
    'Cómprame 1 billete de madrid a burdeos',
    'Quiero un billete desde Valencia  a Sydney',
    'Quiero cuatro billetes a Bali',
    'Comprar 5 billetes a Madagascar',
    'Hola, quisiera reservar un vuelo de ida y vuelta desde Valencia a Madrid con fecha para el 10 de agosto.',
    'Buen día, necesito un vuelo de ida desde Barcelona a Mallorca el próximo viernes',
    'Hola, quisiera reservar un vuelo de ida y vuelta desde Bilbao a Madrid con fecha para el 20 de mayo',
    'Buen día, necesito un vuelo de ida desde Málaga a Barcelona el próximo 5 de mayo',
    'Hola, quisiera reservar un vuelo de ida y vuelta desde Sevilla a Valencia con fecha para el 25 de septiembre',
    'Madrid - Barcelona en Qatar Airways',
    'Hola, quisiera reservar un vuelo de ida y vuelta para Gran Canaria desde Los Ángeles con fecha para el 10 de agosto',
    'Hola, quisiera reservar un vuelo de ida y vuelta desde La Pampa a Ciudad de México con fecha para el diez de agosto'
    
]

## Tokenize

First of all we split the sentences in words to tag later

In [3]:
# Function that tokenise phrases

def tokenizar(_frase):
    
    tokens= word_tokenize(_frase, "spanish")

    return tokens

In [4]:
frases_tokens= []

for frase in corpus:
    
    frases_tokens.append(tokenizar(frase))   
    
frases_tokens

[['Quiero',
  '2',
  'billetes',
  'de',
  'Madrid',
  'a',
  'Frankfurt',
  'en',
  'Septiembre'],
 ['Necesito',
  'comprar',
  'un',
  'billete',
  'a',
  'Madrid',
  'el',
  '5',
  'de',
  'Agosto'],
 ['Comprar',
  'billete',
  'Barcelona',
  'a',
  'Roma',
  'para',
  'el',
  '25',
  'de',
  'Agosto',
  'con',
  'Iberia'],
 ['Billete', 'barato', 'AirEuropa', 'de', 'Madrid', 'a', 'Sevilla'],
 ['Necesito', 'un', 'billete', 'de', 'Tenerife', 'a', 'Jerez'],
 ['Quiero', 'un', 'vuelo', 'para', 'París', 'el', '15', 'de', 'julio'],
 ['Volar', 'desde', 'Florencia', 'a', 'Barcelona'],
 ['Necesito',
  'comprar',
  'un',
  'billete',
  'a',
  'Quito',
  'el',
  '2',
  'de',
  'Noviembre'],
 ['Billete', 'económico', 'Iberia', 'de', 'Madrid', 'a', 'México'],
 ['Necesito',
  'un',
  'vuelo',
  'de',
  'ida',
  'y',
  'vuelta',
  'para',
  'el',
  '20',
  'de',
  'diciembre',
  'a',
  'Madrid',
  'desde',
  'JerezQuiero',
  'reservar',
  'un',
  'vuelo',
  'para',
  'tres',
  'personas',
  'desde'

## Choosing model

We made a manual train, choose the better model to perform our task and train it

In [5]:
manual_train= [
    
    [('Quiero', 'vmip1s0'), ('2', 'z0'), ('billetes', 'ncfp000'), ('de', 'sps00'), ('Madrid', 'np00001'), ('a', 'x'), ('Frankfurt', 'np00001'), ('en', 'sps00'), ('Septiembre', 'npcs000')], 
    [('Necesito', 'vmip1s0'), ('comprar', 'vmn0000'), ('un', 'di0ms0'), ('billete', 'ncms000'), ('a', 'x'), ('Madrid', 'np00001'), ('el', 'da0ms0'), ('5', 'z0'), ('de', 'sps00'), ('Agosto', 'npcs000')], 
    [('Comprar', 'vmn0000'), ('billete', 'ncms000'), ('Barcelona', 'np00001'), ('a', 'x'), ('Roma', 'np00001'), ('para', 'x'), ('el', 'da0ms0'), ('25', 'z0'), ('de', 'sps00'), ('agosto', 'npcs000'), ('con', 'sps00'), ('Iberia', 'np00001')], 
    [('Billete', 'ncms000'), ('barato', 'aq0ms0'), ('AirEuropa', 'np00001'), ('de', 'sps00'), ('Madrid', 'np00001'), ('a', 'x'), ('Sevilla', 'np00001')],
    [('Quiero', 'vmip1s0'), ('ir', 'vmn0000'), ('a', 'x'), ('Roma', 'np00001'), ('con', 'sps00'), ('Lufthansa', 'np00001')],
    [('Necesito', 'vmip1s0'), ('un', 'di0ms0'), ('billete', 'ncms000'), ('de', 'sps00'), ('Tenerife', 'np0000l'), ('a', 'x'), ('Jerez', 'np0000l')],
    [('Quiero', 'vmip1s0'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('para', 'x'), ('París', 'np0000l'), ('el', 'da0ms0'), ('15', 'z0'), ('de', 'sps00'), ('julio', 'npcs000')], 
    [('Volar', 'vmn0000'), ('desde', 'sps00'), ('Florencia', 'np0000l'), ('a', 'x'), ('Barcelona', 'np0000l')],
    [('Necesito', 'vmip1s0'), ('comprar', 'vmn0000'), ('un', 'di0ms0'), ('billete', 'ncms000'), ('a', 'x'), ('Quito', 'np0000l'), ('el', 'da0ms0'), ('2', 'z0'), ('de', 'sps00'), ('Noviembre', 'npcs000')],
    [('Billete', 'ncms000'), ('económico', 'aq0cs0'), ('Iberia', 'np0000l'), ('de', 'sps00'), ('Madrid', 'np0000l'), ('a', 'x'), ('México', 'np0000l')],
    [('Necesito', 'vmip1s0'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('de', 'sps00'), ('ida', 'ncfs000'), ('y', 'cc00'), ('vuelta', 'ncfs000'), ('para', 'x'), ('el', 'da0ms0'), ('20', 'z0'), ('de', 'sps00'), ('diciembre', 'npcs000'), ('a', 'x'), ('Madrid', 'np0000l'), ('desde', 'sps00'), ('Jerez', 'np0000l')],
    [('Quiero', 'vmip1s0'), ('reservar', 'vmn0000'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('para', 'x'), ('tres', 'dn0cp0'), ('personas', 'ncfp000'), ('desde', 'sps00'), ('Londres', 'np0000l'), ('a', 'x'), ('Madrid', 'np0000l'), ('el', 'da0ms0'), ('1', 'z0'), ('de', 'sps00'), ('julio', 'npcs000')],
    [('2', 'z0'), ('vuelos', 'ncmp000'), ('a', 'x'), ('Paris', 'np0000l'), ('el', 'da0ms0'), ('2', 'z0'), ('de', 'sps00'), ('Abril', 'npcs000'), ('con', 'sps00'), ('Easyjet', 'np0000l')],
    [('Vuelo', 'ncms000'), ('económico', 'aq0cs0'), ('a', 'x'), ('Berlín', 'np0000l')],
    [('5', 'z0'), ('billetes', 'ncmp000'), ('de', 'sps00'), ('avión', 'ncms000'), ('a', 'x'), ('Canadá', 'np0000l'), ('con', 'sps00'), ('Emirates', 'np0000l')],
    [('Hola', 'i'), ('quisiera', 'vmic1s0'), ('reservar', 'vmn0000'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('de', 'sps00'), ('ida', 'ncfs000'), ('y', 'cc00'), ('vuelta', 'ncfs000'), ('desde', 'sps00'), ('Valencia', 'np0000l'), ('a', 'x'), ('Madrid', 'np0000l'), ('con', 'sps00'), ('fecha', 'ncfs000'), ('para', 'x'), ('el', 'da0ms0'), ('10', 'z0'), ('de', 'sps00'), ('agosto', 'npcs000')],
    [('Buen', 'aq0ms0'), ('día', 'ncms000'), (',', 'fc'), ('necesito', 'vmip1s0'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('de', 'sps00'), ('ida', 'ncfs000'), ('desde', 'sps00'), ('Barcelona', 'np0000l'), ('a', 'x'), ('Mallorca', 'np0000l'), ('el', 'da0ms0'), ('próximo', 'aq0ms0'), ('viernes', 'nccn000')],
    [('Hola', 'i'), (',', 'fc'), ('quisiera', 'vmic1s0'), ('reservar', 'vmn0000'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('de', 'sps00'), ('ida', 'ncfs000'), ('y', 'cc00'), ('vuelta', 'ncfs000'), ('desde', 'sps00'), ('Bilbao', 'np0000l'), ('a', 'x'), ('Madrid', 'np0000l'), ('con', 'sps00'), ('fecha', 'ncfs000'), ('para', 'x'), ('el', 'da0ms0'), ('20', 'z0'), ('de', 'sps00'), ('mayo', 'npcs000')], 
    [('Buen', 'aq0ms0'), ('día', 'ncms000'), (',', 'fc'), ('necesito', 'vmip1s0'), ('un', 'di0ms0'), ('vuelo', 'ncms000'), ('de', 'sps00'), ('ida', 'ncfs000'), ('desde', 'sps00'), ('Málaga', 'np0000l'), ('a', 'x'), ('Barcelona', 'np0000l')], 
    [('Cómprame', 'vmis2s0'), ('1' ,'z0'), ('billete', 'ncms000'), ('de' ,'sps00'), ('Madrid' ,'np0000l'), ('a', 'x'), ('Burdeos' ,'np0000l')],
    [('quiero', 'vmip1s0'), ('un', 'di0ms0'), ('billete', 'ncms000'), ('desde', 'sps00'), ('Valencia', 'np0000l'), ('a', 'x'), ('Sydney', 'np0000l')],
    [('Quiero' ,'vmip1s0'), ('cuatro', 'mccp00'), ('billetes', 'ncmp000'), ('a', 'x'), ('Bali', 'np0000l')],
    [('Comprar', 'vmn0000'), ('5' ,'z0'), ('billetes' ,'ncmp000'), ('a' ,'x'), ('Madagascar', 'np0000l')],
    [("Hola", "i"), (",", "fc"), ("quisiera", "vmic1s0"), ("reservar", "vmn0000"), ("un", "di0ms0"), ("vuelo", "ncms000"), ("de", "sps00"), ("ida", "ncfs000"), ("y", "cc00"), ("vuelta", "ncfs000"), ("desde", "sps00"), ("Sevilla", "np0000l"), ("a", "x"), ("Valencia", "np0000l"), ("con", "sps00"), ("fecha", "ncfs000"), ("para", "x"), ("el", "da0ms0"), ("25", "z0"), ("de", "sps00"), ("septiembre", "npcs000"), (".", "fp")],
    [('Madrid', 'np00001'), ('-', 'Fg'), ('Barcelona', 'np00001'), ('en', 'sps00'), ('Qatar', 'np00001'), ('Airways', 'np00001')],
    [('hola', 'i'), (',', 'Fc'), ('quisiera', 'vmsi000'), ('reservar', 'vmn0000'), ('un', 'di3ms00'), ('vuelo', 'ncms000'), ('y', 'cc00'), ('vuelta', 'ncfs000'), ('desde', 'sps00'), ('la', 'tdfs0'), ('pampa', 'np00001'), ('a', 'x'), ('Ciudad','np00001' ), ('de','sps00' ), ('Mexico','np00001' ), ('con', 'sps00'), ('fecha', 'ncfs000'), ('para', 'x'), ('el', 'dams00'),('diez', 'mccp00'), ('de', 'sps00'), ('agosto', 'npcs000')],
    [('hola', 'i'), (',', 'Fc'), ('quisiera', 'vmsi000'), ('reservar', 'vmn0000'), ('un', 'di3ms00'), ('vuelo', 'ncms000'), ('y', 'cc00'), ('vuelta', 'ncfs000'), ('para', 'x'), ('Gran', 'np00001'), ('Canaria', 'np00001'), ('desde', 'sps00'), ('Los','tdmp0' ), ('Ángeles','np00001' ), ('con', 'sps00'), ('fecha', 'ncfs000'), ('para', 'x'), ('el', 'dams00'),('diez', 'mccp00'), ('de', 'sps00'), ('agosto', 'npcs000')]
]

In [6]:
#We generate the Train and Test sets
data_train, data_test = train_test_split(manual_train, test_size=0.20, random_state=1)

print('Train tokens:',len(data_train),
      '\nTokens test:    ',len(data_test))

Train tokens: 21 
Tokens test:     6


Having the sets already created, we move on to train the taggers.

To train the ngrams we must execute the tagger with the corpus, for example UnigramTagger(data_train). We will see that the ngrams can have as backoff another ngram.

In the case of HiddenMarkovModelTagger we must execute the function .train().

In [7]:
unigram  = UnigramTagger(data_train)
bigram   = BigramTagger(data_train, backoff=unigram)
trigram  = TrigramTagger(data_train, backoff=bigram)
hmm      = HiddenMarkovModelTagger.train(data_train)

Once the taggers have been trained, we are going to evaluate how each of them tends to perform with the test set. To evaluate it we have to use the train() function, for all the taggers. Let's see how each of them performs.

When you run the training, pay attention to the time it takes for each of the taggers to display the score. While the ngrams are quite fast to extract the information, the HMM takes longer to get the data.

In [8]:
print ('Hit with unigramas: %.2f %%' % (unigram.evaluate(data_test)*100))
print ('Hit with bigramas:  %.2f %%' % (bigram.evaluate(data_test)*100))
print ('Hit with trigramas: %.2f %%' % (trigram.evaluate(data_test)*100))
print ('Hit with HMMs:      %.2f %%' % (hmm.evaluate(data_test)*100))

Hit with unigramas: 74.42 %
Hit with bigramas:  76.74 %
Hit with trigramas: 70.93 %
Hit with HMMs:      81.40 %


Now, we can retrain the taggers with the test data. Although we will not see a big improvement in general terms, as the volume of data we are using is small, it will help.

We will see that if we evaluate the taggers again on the test set, we will get 100% accuracy. This improvement is maximum because all the data was used to train.

In [9]:
unigram  = UnigramTagger(data_test)
bigram   = BigramTagger(data_test, backoff=unigram)
trigram  = TrigramTagger(data_test, backoff=bigram)
hmm      = HiddenMarkovModelTagger.train(data_test)

In [10]:
print ('Acierto con unigramas: %.2f %%' % (unigram.evaluate(data_test)*100))
print ('Acierto con bigramas:  %.2f %%' % (bigram.evaluate(data_test)*100))
print ('Acierto con trigramas: %.2f %%' % (trigram.evaluate(data_test)*100))
print ('Acierto con HMMs:      %.2f %%' % (hmm.evaluate(data_test)*100))

Acierto con unigramas: 97.67 %
Acierto con bigramas:  100.00 %
Acierto con trigramas: 100.00 %
Acierto con HMMs:      97.67 %


In [11]:
hmm= HiddenMarkovModelTagger.train(cess_esp.tagged_sents())
hmm= HiddenMarkovModelTagger.train(manual_train)

frases_tags= []

for frases in frases_tokens:
    
    frases_tags.append(hmm.tag(frases))
    
frases_tags

[[('Quiero', 'vmip1s0'),
  ('2', 'z0'),
  ('billetes', 'ncmp000'),
  ('de', 'sps00'),
  ('Madrid', 'np0000l'),
  ('a', 'x'),
  ('Frankfurt', 'np00001'),
  ('en', 'sps00'),
  ('Septiembre', 'npcs000')],
 [('Necesito', 'vmip1s0'),
  ('comprar', 'vmn0000'),
  ('un', 'di0ms0'),
  ('billete', 'ncms000'),
  ('a', 'x'),
  ('Madrid', 'np0000l'),
  ('el', 'da0ms0'),
  ('5', 'z0'),
  ('de', 'sps00'),
  ('Agosto', 'npcs000')],
 [('Comprar', 'vmn0000'),
  ('billete', 'ncms000'),
  ('Barcelona', 'np00001'),
  ('a', 'x'),
  ('Roma', 'np00001'),
  ('para', 'x'),
  ('el', 'da0ms0'),
  ('25', 'z0'),
  ('de', 'sps00'),
  ('Agosto', 'npcs000'),
  ('con', 'sps00'),
  ('Iberia', 'np0000l')],
 [('Billete', 'ncms000'),
  ('barato', 'aq0ms0'),
  ('AirEuropa', 'np00001'),
  ('de', 'sps00'),
  ('Madrid', 'np0000l'),
  ('a', 'x'),
  ('Sevilla', 'np0000l')],
 [('Necesito', 'vmip1s0'),
  ('un', 'di0ms0'),
  ('billete', 'ncms000'),
  ('de', 'sps00'),
  ('Tenerife', 'np0000l'),
  ('a', 'x'),
  ('Jerez', 'np0000l')],

In [12]:
#Esto es una prueba
hmm.tag(pln.tokenizar( 'Necesito comprar un billete a Madrid el cinco de Agosto'))

[('necesito', 'vmip1s0'),
 ('comprar', 'vmn0000'),
 ('un', 'di0ms0'),
 ('billete', 'ncms000'),
 ('a', 'x'),
 ('madrid', 'np0000l'),
 ('el', 'da0ms0'),
 ('cinco', 'z0'),
 ('de', 'sps00'),
 ('agosto', 'npcs000')]

## Create the grammar

*para* & *a*  we tagged with an **x** to distingh 'Destino' from 'Origen'. That's because 'Origen' and 'Destino' start with the same structure (preposition followed by noun).

In [13]:
reglas= r'''

Destino: <x> {<np00001> <sps00> <np00001>}
Origen: <sps00> {<np00001> <sps00> <np00001>}
Aerolinea: <np.*> <sps00> {<np00001> <np00001>}

Fecha: {<z0> <sps00> <npcs000>}
Fecha: {<mc.*> <sps00> <np.*>}
Fecha: {<npcs000>}

Destino: <x> {<np.*> <np.*>} 
Destino: <x> {<np.*>} 
Destino: <Fg> {<np.*>}

Origen: <sps00> {<td.*> <np00001>}
Origen: <sps00> {<np00001>}
Origen: <sps00> {<np.*>}
Origen: <nc.*> {<np.*>}
Origen: {<np.*>} <Fg>


Aerolinea: <Destino> <sps00> {<np.*>}
Aerolinea: <Fecha> <sps00> {<np.*>}
Aerolinea: <nc.*> <aq.*> {<np.*>} 

NumeroDeBilletes: {<z0>}  <n.*>
NumeroDeBilletes: {<d.*>} <n.*>
NumeroDeBilletes: {<mc.*>} <n.*>

'''

parser= nltk.RegexpParser(reglas)

# Parser function
def parsear(phrase):
    return parser.parse(phrase)

We check that the parser works fine with the defined set of rules.

In [14]:
frase_regex = parsear(
    hmm.tag(
    tokenizar('Madrid - Barcelona en Qatar Airways')))

print(frase_regex)

(S
  (Origen Madrid/np00001)
  -/Fg
  (Destino Barcelona/np00001)
  en/sps00
  (Aerolinea Qatar/np00001 Airways/np00001))


## Functions

### Json(_tree)

In [15]:
def genera_json(_tree):
    # The genera_json function receives a parsed tree (_tree) as a parameter and generates a JSON object

    # We create a dictionary object with keys for each piece of information we want to extract from the tree

    result = {'Origen': None, 'Destino': None, 'Aerolinea': None , 'NumeroDeBilletes': None, 'Fecha': None}

    # We iterate over each node in the parsed tree

    for nodo in _tree:

        # If the node is not a tuple, we assume it contains a single word or punctuation mark
        # We concatenate all the words in the node to form a single string, and assign it to the appropriate key
        # in the result dictionary based on the node's label.

        if type(nodo) != tuple:

            count = 0
            valor = ''

            for elemento in nodo:
                count += 1
                palabra, categoria = elemento

                if count == 1:
                    valor = valor + palabra

                else:
                    valor = valor + ' ' + palabra


            if nodo.label() == 'Origen':

                result['Origen'] = valor

            if nodo.label() == 'Destino':
                result['Destino'] = valor

            if nodo.label() == 'Aerolinea':
                result['Aerolinea'] = valor

            if nodo.label() == 'NumeroDeBilletes':
                result['NumeroDeBilletes'] = valor

            if nodo.label() == 'Fecha':
                result['Fecha'] = valor 

    # We then check each key in the result dictionary to see if it has a value (i.e., if it was successfully extracted
    # from the parsed tree). If the value is None, we prompt the user to enter the missing information.

    for k,v in result.items():

        if v == None and k != 'Fecha':
            result[k]=input('Enter {}:'.format(k))

        if v == None and k == 'Fecha':
            result[k]=input('Enter {} in the correct format. Example: March 2:'.format(k))   

    # Finally, we return the completed result dictionary as a JSON object

    return result


In [16]:
#This is an example to check
genera_json(parsear(frase_regex))

Enter NumeroDeBilletes:1
Enter Fecha in the correct format. Example: March 2:March 2


{'Origen': 'Madrid',
 'Destino': 'Barcelona',
 'Aerolinea': 'Qatar Airways',
 'NumeroDeBilletes': '1',
 'Fecha': 'March 2'}

### Mic_conversion()

In [17]:
import speech_recognition as sr
import pyaudio

# We instantiate the speech recognizer
instancia = sr.Recognizer()

# We instantiate the Microphone function for the microphone works
mic= sr.Microphone()

In [18]:
def mic_conversion():
    """
    Esta función graba audio del micrófono, lo transcribe y devuelve la transcripción.
    """
    try:
        # We set up the microphone and clean up the ambient noise
        with mic as source:
            instancia.adjust_for_ambient_noise(source)

            # We record the audio and transcribe it using the Google Cloud Speech-to-Text API
            audio = instancia.listen(source)
            transcript = instancia.recognize_google(audio, language='es-ES', show_all = True)



            # We return the transcript and sentiment analysis
            return transcript['alternative'][0]['transcript']

    except sr.RequestError:
        print("No se pudo obtener una conexión a la API de Google Speech Recognition")
    except sr.UnknownValueError:
        print("No se pudo entender el audio")

### Speak()

In [19]:
import pyttsx3

# We instantiate pyttsx3
engine = pyttsx3.init()

#Speed setting
engine.setProperty('rate', 140)

#Language settings
engine.setProperty('voice', 'spanish')

In [20]:
# Function converting text to audio
def speak(texto):
    engine.say(texto)
    engine.runAndWait()

### Asistant()

In [21]:
def asistant():
    
    speak ("Hola, bienvenido a Skytotravel, ¿en que puedo ayudarle? ")
    

    user_input = mic_conversion()
    
    
    
    sentence_tokenized = tokenizar(user_input)
      
    sentence_tagged = hmm.tag(sentence_tokenized)
    
    sentence_parsed = parsear(sentence_tagged)
    
    
    result = {'Origen': None, 'Destino': None, 'Aerolinea': None , 'NumeroDeBilletes': None, 'Fecha': None}
    
    for nodo in sentence_parsed:
        
        if type(nodo) != tuple:
            
            count = 0
            valor = ''
        
            for elemento in nodo:
                count += 1
                palabra, categoria = elemento

                if count == 1:
                    valor = valor + palabra
                else:
                    valor = valor + ' ' + palabra

            if nodo.label() == 'Origen':

                result['Origen'] = valor

            if nodo.label() == 'Destino':
                result['Destino'] = valor
            if nodo.label() == 'Aerolinea':
                result['Aerolinea'] = valor
            if nodo.label() == 'NumeroDeBilletes':
                result['NumeroDeBilletes'] = valor
            if nodo.label() == 'Fecha':
                result['Fecha'] = valor 
     
    for k,v in result.items():
        
        if v == None and k != 'Fecha':
            speak('Ingrese {}:'.format(k))
            result[k]= mic_conversion()
            
        if v == None and k == 'Fecha':
            speak('Ingrese {}:'.format(k))
            result[k]= mic_conversion()          

    speak( "Perfecto, comienzo la busqueda de tu viaje desde {} a {} para la fecha {} con {}."
          .format(result["Origen"],result["Destino"], result["Fecha"], result["Aerolinea"],))
    
    return result

### Skytotravel()

In [22]:
def skytotravel():
    
    DatosBillete = asistant()
    
    speak('¿Están todos los datos correctos? Responde sí o no.')

    user_input2 = mic_conversion()
    
    afirmacion = ['si','sí']

    while user_input2 not in afirmacion:
        speak('Perdona, vuelve a realizar la consulta.')

        DatosBillete = asistant()

        speak('¿Están todos los datos correctos? Responde sí o no.')

        user_input2 = mic_conversion()
    
    speak('De acuerdo, procedo a realizar la búsqueda.')
    
    return DatosBillete   

## MySQL

In [23]:
import mysql.connector
import pandas as pd

### Create database

In [24]:
#Connections
warnings.filterwarnings("ignore")
db = mysql.connector.connect(
    host = "localhost",
    user = "root",
    password= "12345"
    
)

In [25]:
cursor = db.cursor()

In [None]:
cursor.execute("CREATE DATABASE sky2travel")
db.commit()

### Create table

In [27]:
#SQL Function
def sql(_query):
    return pd.read_sql_query(_query,db)


In [28]:
#Connections
warnings.filterwarnings("ignore")
db = mysql.connector.connect(
    host = "localhost",
    user = "root",
    password= "12345",
    database= "sky2travel"
    
)

In [29]:
cursor = db.cursor()

In [None]:
query_compras = """
    
    CREATE TABLE compras (
        
        id_compra int NOT NULL AUTO_INCREMENT,
        NumeroDeBilletes varchar(50),
        Destino varchar(50),
        Fecha varchar(50), 
        Origen varchar(50),
        Aerolinea varchar(50),
        PRIMARY KEY (id_compra)
    )
    
"""

cursor.execute(query_compras)
db.commit()

### Fill the table

In [31]:
def order_ticket():
    
    DatosBillete= skytotravel()
    
    query = """
    INSERT INTO compras
    (Origen, Destino, Aerolinea, NumeroDeBilletes, Fecha)
    VALUES(%s, %s, %s, %s, %s)
    """
    valores = tuple(DatosBillete.values())

    cursor.execute(query, valores)

    db.commit()
    

In [33]:
order_ticket()