A LangChain-powered French learning tool that transforms French text into rich, bilingual study materials with vocabulary highlights, grammar analysis, audio pronunciation, and exports to Notion or PDF.
┌─────────────────────────────────────────────────────────────┐
│ index.ts │
│ (Entry point / CLI) │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ workflow.ts │
│ LangChain RunnableSequence (3 steps) │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ splitStep│ → │translationStep│ → │ enrichmentStep │ │
│ └──────────┘ └──────────────┘ └────────────────┘ │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│translator.ts│ │ analysis.ts│ │ audio.ts │
│ (LangChain │ │ (LangChain │ │(Azure TTS) │
│ Chain) │ │ Chain) │ │ │
└─────────────┘ └────────────┘ └────────────┘
| Component | File | Purpose |
|---|---|---|
RunnableSequence |
workflow.ts |
Chains split → translate → enrich steps |
RunnableLambda |
workflow.ts |
Wraps async functions as composable runnables |
AzureChatOpenAI |
translator.ts, analysis.ts |
Azure OpenAI model wrapper |
ChatPromptTemplate |
translator.ts, analysis.ts |
Structured prompts for LLM |
StructuredOutputParser |
analysis.ts |
Zod-validated JSON parsing |
- Input: French
.txtfile - Split: Text → array of sentences
- Translate (LangChain): Each sentence →
{ translation, vocabulary[] }(parallel processing) - Enrich: Add audio paths, assign colors to vocabulary
- Analyze (LangChain): Full article → difficulty level (A1-B2) + grammar highlights
- Output: Notion page / PDF with bilingual content
-
Install dependencies:
npm install
-
Configure environment variables: Create a
.envfile in the root directory (if it doesn't exist) and add your credentials:NOTION_TOKEN=your_notion_integration_token NOTION_PAGE_ID=your_parent_page_id # Azure OpenAI Configuration AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint AZURE_OPENAI_API_KEY=your_azure_openai_api_key AZURE_OPENAI_DEPLOYMENT_NAME=your_deployment_name # Azure Speech (Text-to-Speech) AZURE_SPEECH_KEY=your_speech_key AZURE_SPEECH_REGION=your_speech_region AZURE_SPEECH_VOICE=fr-FR-VivienneNeural
-
Place your French text files (
.txtformat) in theinputfolder. -
Run the tool:
npm startThe tool will automatically process all .txt files in the input directory and create Notion pages.
- Reads French text files from the
inputfolder. - Splits content into sentences.
- Translates each sentence to Chinese using Azure OpenAI.
- Extracts 3-5 key vocabulary words from each sentence.
- Highlights key vocabulary with random colors (bold + underline) in both French and Chinese.
- Creates a new page in Notion under the specified parent page.
- Displays each sentence with its translation in a two-column layout (French | Chinese).
- Includes a vocabulary list at the end with color-coded terms and a fill-in-the-blank practice section.
- Caches Azure OpenAI translation/vocabulary output per article title to skip repeated LLM calls on subsequent runs.
- Generates Azure Speech audio files for every sentence plus a full-article track, caching them on disk (audio stays local and is not embedded in Notion).
- Builds adaptive practice exercises (fill-in-the-blank, multiple choice, listening) and saves them to
output/exercisesfor use outside Notion. - Coordinates sentence translation, vocabulary extraction, audio, and highlighting via a LangChain workflow so each step can be extended or reused.
The generation flow is implemented as a LangChain workflow (buildArticleWorkflow) that chains three runnable steps:
-
Sentence Splitter (
splitStep) – Takes the raw article text, splits it into French sentences, and validates that the article isn't empty. -
Translation Chain (
translationStep) – For each sentence, runs a LangChainAzureChatOpenAIstructured-output chain that returns the Chinese translation plus 3–5 vocabulary pairs. Sentences are processed in parallel for improved performance. Results are cached to skip repeated LLM calls. -
Enrichment & Audio (
enrichmentStep) – Assigns highlight colors, generates optional Azure Speech.wavfiles (also in parallel), and packages the result into aProcessedArticleconsumed by the Notion/PDF exporters.
A separate LangChain workflow (analyzeGrammarHighlights) analyzes the full article to:
- Determine CEFR difficulty level (A1, A2, B1, B2)
- Extract 3-5 key grammar patterns with explanations in Chinese
Because the workflow is composed with LangChain's runnable abstractions, you can inject additional steps (e.g., quizzes, flashcards) or swap in different models without touching the CLI entry point.
- Every time an article is processed, the raw translation + vocabulary data returned by Azure OpenAI is stored inside the
cache/folder (one JSON file per title). - When the same file/title is processed again, the tool reuses the cached LLM data and only reassigns highlight colors before creating Notion documents.
- If you edit the source text but keep the same filename, the script automatically detects that the cached French sentences no longer match and regenerates the translations.
- To force a fresh run manually, delete the corresponding file inside
cache/(or remove the entire folder) before runningnpm start.
- When
AZURE_SPEECH_KEYandAZURE_SPEECH_REGIONare configured, the script uses Azure Cognitive Services Speech to synthesize each French sentence and a full-article.wavwith the configuredAZURE_SPEECH_VOICE(defaults tofr-FR-VivienneNeural). - Audio files are stored under
output/audio/<article-title>/sentence-XX_<voice>.wavplusarticle-full_<voice>.wav. Paths are logged to the console, but clips are not attached to Notion; play or upload them wherever you prefer. - Audio generation is cached: existing
.wavfiles are reused automatically, and only missing clips are synthesized on later runs.
- After an article is processed, a LangChain workflow plans which practice items to create based on sentence complexity and available audio.
- Exercise types currently include fill-in-the-blank, multiple choice, and listening prompts that reference the cached audio files.
- Generated exercises are written as JSON to
output/exercises/<article>.json, so you can import them into flashcard tools or custom study apps without affecting the Notion page layout.