AI Model Evaluator | Data Analyst | Bilingual EN/PT 🇧🇷🇵🇹
Based in Lisbon, Portugal. I evaluate and improve LLM outputs — detecting hallucinations, designing annotation rubrics, and building bilingual EN/PT evaluation datasets.
| Area | Tools & Skills |
|---|---|
| LLM Evaluation | RAGAS · DeepEval · LLM-as-Judge · Hallucination Detection |
| Annotation | Rubric Design · Cohen's Kappa · Bilingual EN/PT Datasets |
| RAG Pipelines | LangChain · FAISS · OpenAI API · Golden Datasets |
| Data Analysis | Python · SQL · Power BI · ETL |
🔍 RAG Hallucination Detector
End-to-end RAG pipeline with automated hallucination detection. Bilingual EN/PT golden dataset. RAGAS metrics.
📝 Bilingual LLM Annotation Testset
100+ manually annotated EN/PT prompt-response pairs. 5-dimension rubric. Cohen's Kappa IAA.
📊 LLM Eval Dashboard
Interactive dashboard comparing hallucination rates across models and languages (EN vs PT).
- 🇺🇸 English — Intermediate
- 🇧🇷🇵🇹 Portuguese — Native (Brazil & Portugal)