Skip to content

Conversation

@absadiki
Copy link
Contributor

@absadiki absadiki commented Dec 21, 2024

This PR is a step towards resolving #154.

It introduces a VideoConverter class that converts videos to markdown by:

  • Extracting metadata (if exiftool is installed)
  • Performing speech transcription (if speech_recognition and pydub are installed)
  • Generating a summary via a multimodal LLM from the transcription [This is optional and defaults to True if llm_client is configured]

Notes:

  • I believe checking the file type based on the extension is not ideal. There are many video extensions, and I think checking the mime_type would be a better approach, as it can cover a wider range of video files.
  • I’m unsure about the testing strategy .. should we focus only on testing exiftool? Please share your thoughts on this.
  • Additionally, I suggest refactoring Mp3Converter into a more general AudioConverter, as there are many audio extensions to consider. If you agree with this, I can submit a separate PR for it.

@l-lumin
Copy link
Contributor

l-lumin commented Dec 22, 2024

could you add tests?

@absadiki
Copy link
Contributor Author

@l-lumin, could you provide a sample video file that is allowed to be uploaded to the repo?

@l-lumin
Copy link
Contributor

l-lumin commented Dec 23, 2024

@l-lumin, could you provide a sample video file that is allowed to be uploaded to the repo?

I think you can use the file you tested locally.If it's wrong, can change it later

@absadiki
Copy link
Contributor Author

@l-lumin, okey I created a sample video file using ffmpeg. I've added test for exiftool for now.
Maybe we can add tests for transcription as well, but #194 should be merged first.

@hariappointy
Copy link

Will this be merged soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants