<a href="https://colab.research.google.com/github/xavirubi/tappx-challenge/blob/master/alpaca_lora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Talk to Alpaca-LoRA

This notebook contains minimal code for running [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/) for demonstration purposes. Please check the repo for more details.

In [None]:
!pip install bitsandbytes
!pip install -q datasets loralib sentencepiece
!pip install -q git+https://github.com/zphang/transformers@c3dc391
!pip install -q git+https://github.com/huggingface/peft.git


In [None]:
from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
model = LLaMAForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "tloen/alpaca-lora-7b")

In [3]:
def generate_prompt(instruction, input=None):
    if input:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

In [4]:
generation_config = GenerationConfig(
    temperature=0.1,
    top_p=0.75,
    num_beams=4,
)

def evaluate(instruction, input=None):
    prompt = generate_prompt(instruction, input)
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256
    )
    for s in generation_output.sequences:
        output = tokenizer.decode(s)
        # print("Response:", output.split("### Response:")[1].strip())
        return ("Response:", output.split("### Response:")[1].strip())

In [None]:
!pip install -U deep-translator

In [6]:
import json
from deep_translator import GoogleTranslator


f_articles = open('articles.json')
f_videos = open('videos.json')

articles_data = json.load(f_articles)
videos_data = json.load(f_videos)

article_ids = list(article_id for article_id in articles_data.keys())
video_ids = list(video_id for video_id in videos_data.keys())

def article_keywords(id):
	return articles_data[id]["keywords"]

def video_keywords(id):
	return videos_data[id]["keywords"]


# translated = GoogleTranslator(source='es', target='en').translate(text)

# article = ' '.join(article_keywords(article_ids[0]))
# en_article = GoogleTranslator(source='es', target='en').translate(article)
for i in range(0, len(article_ids)):
  article = ' '.join(article_keywords(article_ids[i]))
  en_article = GoogleTranslator(source='es', target='en').translate(article)
  for j in range(0, len(video_ids)):
    video = ' '.join(video_keywords(video_ids[j]))
    en_video = GoogleTranslator(source='es', target='en').translate(video)
    response = evaluate(f'''is this set of words "{en_article}" related to this other set of words "{en_video}"''')
    if 'yes' in response[1].lower():
      print(f'''Article {i}: {article}\nVideo {j}: {video}\n{response[1]}\n\n''')


Article 0: S 80 Armada Espanola submarinos renovar flota Ministerio de Defensa sumergibles flotabilidad riesgo hundimiento balance de pesos agrandado asesores estadounidenses muelles de atraque base naval de Cartagena presupuesto sobrecostos mercado internacional.
Video 5: siento es la desconexion con sus amigos no poder verlos todos los dias y compartir con ellos como solian hacerlo. Tambien experimentan miedo e incertidumbre por el futuro por lo que les deparara la situacion causada por la epidemia en cuanto a su educacion y su vida en general. En cuanto a las medidas implementadas para la educacion a distancia algunos se han adaptado bien y han respondido positivamente mientras que otros han tenido dificultades para acceder a la tecnologia necesaria o para seguir el ritmo de las clases online.
Yes, this set of words is related to the other set of words.


Article 0: S 80 Armada Espanola submarinos renovar flota Ministerio de Defensa sumergibles flotabilidad riesgo hundimiento balanc

KeyboardInterrupt: ignored

In [None]:
import json
from deep_translator import GoogleTranslator


f_articles = open('articles.json')
f_videos = open('videos.json')

articles_data = json.load(f_articles)
videos_data = json.load(f_videos)

article_ids = list(article_id for article_id in articles_data.keys())
video_ids = list(video_id for video_id in videos_data.keys())

def article_keywords(id):
	return articles_data[id]["keywords"]

def video_keywords(id):
	return videos_data[id]["keywords"]

def article_text(id):
	return articles_data[id]["text"]

def video_text(id):
	return videos_data[id]["text"]

# translated = GoogleTranslator(source='es', target='en').translate(text)

article = ' '.join(article_text(article_ids[0]))
en_article = GoogleTranslator(source='es', target='en').translate(article)

for i in range(0, len(video_ids)):
  en_video = GoogleTranslator(source='es', target='en').translate(' '.join(video_text(video_ids[i])))
  print(f"{i}.     ")
  evaluate(f'''is this text "{en_article}" related to this other text "{en_video}"''')