---

# Basic sentence generation using a list of IDs as input
## Concept ID Database

### POS for qutr

* sent: complete sentences and greetings
* vp: verb phrases to refer to sentence templates with empty noun phrases
* np: noun and noun phrases
* adj: adjective
* adv: adverb
* conj: conjunctions
* prt: particles or other function words
* num: cardinal numbers
* X: other

"A Universal Part-of-Speech Tagset" by Slav Petrov, Dipanjan Das and Ryan McDonald

In [1]:
import pandas as pd

## Notes
* need queries to be separated as single sentences; hard to parse otherwise

In [23]:
## general translate method for all languages, requires target lang specification as parameter
def translate(query, target):
    #phrase id: pid
    final = ""
    np = ""
    temp = ""
    raw = ""
    
    for pid in query:
        if type(pid) is int:
            np += str(pid)
            raw += str(pid) + " | "
        else:
            prow = target.loc[pid]
            raw += prow.phrase + " | "
            #print(prow.complete)
            if prow.pos == "phrs":
                final += prow.phrase + "\t"
            else:
                ## there can only be one templating sentence per query
                if "*" in prow.phrase and prow.pos == "vp":
                        temp = prow.phrase
                else:
                    np += prow.phrase.replace("*", "").lower()
                    
    if temp != "":
        temp = temp.replace("*", np)
    final += temp
    
    print("Query IDs were: " + raw + "\n")
    print(final)
    print("---\n")

### Example sentence compilation

In [24]:
translate(["p21", 5, "p22"], en)
translate(["p36"], en)
translate(["p3", "p35", "p15"], en)
translate(["p16", "p17"], en)

Query IDs were: I would like * | 5 | * kilograms | 

I would like 5 kilograms
---

Query IDs were: I do not understand | 

I do not understand	
---

Query IDs were: Good Morning | Do you have *? | Apple | 

Good Morning	Do you have apple?
---

Query IDs were: Mango | How much are *? | 

How much are mango?
---



In [25]:
translate(["p21", 5, "p22"], cn)
translate(["p36"], cn)
translate(["p3", "p35", "p15"], cn)
translate(["p16", "p17"], cn)

Query IDs were: 我要* | 5 | *公斤 | 

我要5公斤
---

Query IDs were: 我听不懂 | 

我听不懂	
---

Query IDs were: 早上好 | 有没有*？ | 苹果 | 

早上好	有没有苹果？
---

Query IDs were: 芒果 | *多少钱？ | 

芒果多少钱？
---



In [26]:
translate(["p21", 5, "p22"], ar)
translate(["p36"], ar)
translate(["p3", "p35", "p15"], ar)
translate(["p16", "p17"], ar)

Query IDs were: أود * | 5 | *كلغ | 

أود 5كلغ
---

Query IDs were: لا افهم | 

لا افهم	
---

Query IDs were: صباح الخير | هل تمتلك *؟ | تفاح | 

صباح الخير	هل تمتلك تفاح؟
---

Query IDs were: مانجو | كم هي *؟ | 

كم هي مانجو؟
---



---

## Jan 28 Notes

### implementing a multilingual phrase compilation model

How do we build a single model that builds sentences accurately for all languages?

* importance of a clean, well organized database
* scalability of the database?
* how to specify gender and age? through addition of more concept phrases or somehow inflecting already existing phrases?

---