Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video imagery to text (Closed Captioning) #11770

Closed
A2113S opened this issue Oct 13, 2023 · 8 comments
Closed

Video imagery to text (Closed Captioning) #11770

A2113S opened this issue Oct 13, 2023 · 8 comments
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules

Comments

@A2113S
Copy link

A2113S commented Oct 13, 2023

Feature request

Implement a feature using Langchain's image_captions.py and audio_speech_to_text.py to produce .srt files. This system will provide both subtitles and visual scene descriptions, essentially creating closed captioning.

Motivation

Current closed captioning primarily captures spoken content, omitting crucial visual elements of videos. This lack of comprehensive description hinders a full understanding, especially for the hearing-impaired community. Our goal is to elevate video accessibility by integrating both visual and auditory components in closed captioning, ensuring a more inclusive viewing experience.

Your contribution

We are students at the University of Toronto, interested in contributing to Langchain for a school project. We would like to submit a PR that implements this feature sometime in November.

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Oct 13, 2023
@dosubot
Copy link

dosubot bot commented Oct 13, 2023

🤖

Thank you for your interest in contributing to LangChain! Your proposed feature to generate .srt files with both subtitles and visual scene descriptions sounds like a valuable addition to the framework.

We appreciate your motivation to enhance video accessibility and provide a more inclusive viewing experience. It's great to see your commitment to addressing the needs of the hearing-impaired community.

Please go ahead and work on implementing this feature. We look forward to reviewing your pull request in November. If you have any questions or need any guidance during the process, feel free to ask. Good luck with your school project!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@A2113S
Copy link
Author

A2113S commented Oct 13, 2023

@dosu-bot What would the next steps look like? We should probably start by creating an outline for the implementation of this feature. Since Langchain already has an image analyzer and speech-to-text feature, the main goal would be to effectively break down a video into frames and then to make sure the output of each frame is combined into something that is accurate and makes sense.

Also, can I add my group members (GitHub accounts) to this issue? Just so we are all on the same page.

@dosubot
Copy link

dosubot bot commented Oct 15, 2023

🤖

Hello,

It's great to hear that you're interested in contributing to LangChain! Your feature idea sounds very promising and would be a valuable addition to the project.

To start implementing this feature, you can follow the steps outlined in the CONTRIBUTING.md file. This includes forking the repository, creating a new branch, implementing the feature, committing and pushing your changes, and finally opening a pull request.

As you've mentioned, LangChain already has an image analyzer and speech-to-text feature. Your main task would be to break down a video into frames and then combine the output of each frame into something that makes sense. This will likely involve using the image_captions.py and audio_speech_to_text.py modules, and possibly others.

For the .srt file format, you can refer to the pysrt library that LangChain uses to load and parse .srt files. The standard structure of an .srt file is as follows:

1
00:00:20,000 --> 00:00:24,400
This is subtitle text.

2
00:00:25,000 --> 00:00:27,000
This is another subtitle text.

Each subtitle entry consists of a sequence number, the start and end times of the subtitle, the subtitle text itself, and a blank line indicating the end of the subtitle entry.

To add your group members to this issue, you can navigate to the issue in the LangChain repository and click on the 'assign yourself' link on the right side of the issue page. If you do not have the necessary permissions to do this, you can comment on the issue asking to be assigned.

Please remember that all code contributions must pass the formatting, linting, and testing checks before they can be merged. You can run these checks locally using the commands make format, make lint, and make test.

I hope this helps! If you have any further questions, feel free to ask.

Best of luck with your project!

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@nicoledroi1
Copy link

@dosu-bot Hello, I am also part of the group contributing to this issue. Can you please assign me to this issue?

@A2113S
Copy link
Author

A2113S commented Nov 5, 2023

@dosu-bot Hello! When trying to use the ImageCaptionLoader, we were following the doc on it. We get the error that NoneType object is not callable at the line processor = BlipProcessor.from_pretrained(self.blip_processor)

Copy link

dosubot bot commented Nov 5, 2023

🤖

The error message "NoneType object is not callable" typically indicates that you're trying to call a None object as if it were a function or method. In your case, the error is happening at the line processor = BlipProcessor.from_pretrained(self.blip_processor), which suggests that BlipProcessor.from_pretrained might be None.

Here are a few potential reasons for this issue:

  1. The transformers package might not be installed or properly installed. You can verify this by running pip show transformers in your terminal. If it's not installed, you can install it using pip install transformers.

  2. The BlipProcessor class might not be correctly imported from the transformers package. You can verify this by checking if BlipProcessor is available in the transformers package by running dir(transformers) in your Python interpreter.

  3. The from_pretrained method might not be available in the BlipProcessor class. You can verify this by checking if from_pretrained is a method of BlipProcessor by running dir(BlipProcessor) in your Python interpreter.

  4. The self.blip_processor might not be a valid pre-trained model. You can verify this by checking the value of self.blip_processor. It should be a valid pre-trained model available in the Hugging Face model hub.

If all the above checks pass, then the issue might be with the transformers package itself. You might need to update it to the latest version using pip install --upgrade transformers. If the issue still persists, it would be best to raise an issue in the transformers GitHub repository.

Unfortunately, I couldn't find specific details about how the BlipProcessor.from_pretrained(self.blip_processor) method is implemented in the LangChain codebase. If you could provide more information about your implementation or the steps leading up to the error, it might help in diagnosing the problem.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@LunarECL
Copy link
Contributor

@dosu-bot We created a new chain using some new dependencies that the other existing chains don't have. Thus, when running the tests, they fail because they cannot find the specified module. For example, we are using OpenCV to split videos into frames. How should we add the opencv dependency to langchain so they work with other chains?

We keep getting the error that ModuleNotFoundError: No module named 'cv2' for every existing chain when trying to run the tests.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024
Copy link

dosubot bot commented Mar 11, 2024

Hi, @A2113S,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue was raised by you to request the implementation of a feature using Langchain's image_captions.py and audio_speech_to_text.py to generate .srt files for closed captioning, aiming to enhance accessibility for the hearing-impaired community. The issue gained interest from dosubot, LunarECL, nicoledroi1, and ifritzler, who provided guidance and assistance on the implementation steps, dependencies, and error troubleshooting. The students from the University of Toronto are aiming to contribute to Langchain for a school project and have received support and instructions from the maintainers of the repository.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you!

@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 18, 2024
baskaryan added a commit that referenced this issue Mar 30, 2024
…14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: #11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
azurewtl pushed a commit to azurewtl/langchain that referenced this issue Mar 30, 2024
…angchain-ai#14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: langchain-ai#11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
gkorland pushed a commit to FalkorDB/langchain that referenced this issue Mar 30, 2024
…angchain-ai#14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: langchain-ai#11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
marlenezw pushed a commit to marlenezw/langchain that referenced this issue Apr 2, 2024
…angchain-ai#14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: langchain-ai#11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Je-Cp pushed a commit to Je-Cp/jcp-langchain that referenced this issue Apr 2, 2024
…#14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: langchain-ai/langchain#11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
hinthornw pushed a commit that referenced this issue Apr 26, 2024
…14059)

Description: Video imagery to text (Closed Captioning)
This pull request introduces the VideoCaptioningChain, a tool for
automated video captioning. It processes audio and video to generate
subtitles and closed captions, merging them into a single SRT output.

Issue: #11770
Dependencies: opencv-python, ffmpeg-python, assemblyai, transformers,
pillow, torch, openai
Tag maintainer:
@baskaryan
@hwchase17


Hello!

We are a group of students from the University of Toronto
(@LunarECL, @TomSadan, @nicoledroi1, @A2113S) that want to make a
contribution to the LangChain community! We have ran make format, make
lint and make test locally before submitting the PR. To our knowledge,
our changes do not introduce any new errors.

Thank you for taking the time to review our PR!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules
Projects
None yet
Development

No branches or pull requests

3 participants