In [None]:
# |hide
%reload_ext autoreload
%autoreload 2

In [None]:
# | hide
from course_copilot.utils import *

# fsdl_2022_course_project

> Course Co-pilot helps course creators create lesson summaries and chapter markers using ML.

"- order: 2"

Our project is to create an augmented ML approach course creators can use to streamline the generation of lecture summaries and chapter markers based on lesson videos.

::: {.callout-tip appearance=\"simple\"}

Do checkout out [video demo showcasing Course Co-pilot](https://www.loom.com/share/ca26fff6b911478c879da679495e8f67).

:::


![Functionalities of Course Co-pilot](https://user-images.githubusercontent.com/24592806/195798582-4ef1cd04-24d9-427e-bb0f-3fbb41e1a0e3.png)

The basic workflow is:

1. User opens a link to a YouTube video lecture in our application and asks Course Co-Pilot to process it
2. User can view status of requests via the “Get Predictions” button.
3. User can view predicted topic boundaries, headlines, & content summaries for processed videos.
4. User can correct and save generated content (planned later to use in data flywheel)
5. User will be able to export results as chapter markers to use in YouTube(planned later)
6. User will be able to export results in a quarto friendly format for posting to a web page or blog.(planned later)


## Why


In our own experience, we have noticed that such content either doesn’t get done, is time consuming, and/or requires work from outside parties. In particular, we noted in the below courses we’ve been a part of:

1. Fast.ai course - During the course students manually create [youtube chapter markers](https://forums.fast.ai/t/help-wanted-youtube-chapter-markers/96306), [lesson transcripts](https://forums.fast.ai/t/help-wanted-transcriptions/96307), and summaries on the forums.

2. FSDL course - The chapter markers and [lesson notes](https://fullstackdeeplearning.com/course/2022/lecture-2-development-infrastructure-and-tooling/) are later created manually and then shared on the FSDL website usually 1 week after the each lesson.

## How our application is structured?

![System Diagram](https://user-images.githubusercontent.com/24592806/195852655-3e22b972-d09f-4646-8f75-70a04bda2081.png)

## What have we done so far?

Let’s look at the dataset, ML library, API, and web application we created for our prototype system

### Dataset 

Since we had to train summarization models and topic segmentation models, we manually created our dataset from a bunch of youtube videos ranging from videos from fastai lessons, FSDL lesson to random videos teaching something.

[Dataset Link](https://huggingface.co/datasets/kurianbenoy/Course_summaries_dataset)

![Dataset Schema](https://user-images.githubusercontent.com/24592806/195852870-b9f2acb2-99a7-44a4-92f1-5f465cd1a45b.png)

### ML library: course_copilot

We leveraged nbdev framework to create a python package which acted as our framework for Model training and model serving. We integrated Wandb for experiment tracking and fine tuning models with sweeps. We created Model trainers for task of topic segmentation and summarization.
The timing of our project coincided with release whisper which we used for creating transcription of youtube video URL you are passing. This helps to provide the required data for creating topic segments and summaries.

[fsdl_2022_course_project](https://github.com/ohmeow/fsdl_2022_course_project)

![nbdev based Model Trainer for Topic Segmentation, Experiment tracking with W&B](https://user-images.githubusercontent.com/24592806/195852790-fef77960-1066-414d-8568-a177f4d1159e.png)

### Backend API

For the backend, we used FastAPI for creating APIs. Our API is leveraging dagster as the workflow engine to create tasks for running inference jobs from creating transcripts of video with whisper, running topic segmentation and running the summarization models.

[fsdl-2022-group-007-app](https://github.com/suvash/fsdl-2022-group-007-app)

![Course Copilot APIs](https://user-images.githubusercontent.com/24592806/195852828-52217828-cab7-4b7f-b6e6-227a5cb27abe.png)

### Web Application

We created our front-end web application using Vue3 and Quasar. It is deployed to github pages from our repo.

[fsdl-2022-group-007-web](https://github.com/ohmeow/fsdl-2022-group-007-web)


![Topic summaries and chapter summaries generated](https://user-images.githubusercontent.com/24592806/195858845-4ba257ea-935f-4e57-8650-732e493aa7b3.png)



## Future Plans

- Improve quality of training data
- Allow users to save their corrected headlines and summaries
- Add ability for users to update topic spans
- Implement data flywheel
- Implement chapter marker and quarto export features
- Add authentication/authorization

## Install

```sh
pip install course_copilot
```

## Setting up your development environment

Please take some time reading up on nbdev ... how it works, [directives](https://nbdev.fast.ai/explanations/directives.html), etc... by checking out [the walk-thrus](https://nbdev.fast.ai/tutorials/tutorial.html) and [tutorials](https://nbdev.fast.ai/tutorials/) on the nbdev [website](https://nbdev.fast.ai/)

## Step 1: Create conda environment

After cloning the repo, create a conda environment. This will install nbdev alongside other libraries likely required for this project.

`mamba env create -f environment.yml`


## Step 2: Install Quarto:

`nbdev_install_quarto`


## Step 3: Install hooks 

`nbdev_install_hooks`


## Step 4: Add pre-commit hooks (optional)
If using VSCode, you can install pre-commit hooks “to catch and fix uncleaned and unexported notebooks” before pushing to get.  See the instructions in the nbdev documentation if you want to use this feature. https://nbdev.fast.ai/tutorials/pre_commit.html


## Step 5: Install our library

`pip install -e '.[dev]'`