Skip to content

An innovative Python project that pioneers multimedia transcription and introduces a multi-gender artificial voice with advanced text conversion capabilities

snawaza243/MediaPad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MediaPad

MediaPad is a Python-based desktop application that aims to provide a comprehensive set of features related to multimedia transcription, language conversion, and artificial voice generation. The application is currently in the development phase, but it is highly functional and efficient and can be used for various purposes.

Features

The MediaPad application comes with a range of features that can be used in multimedia transcription, language conversion, and voice generation. These include:

Language Conversion

MediaPad provides an easy-to-use language converter that can translate text from any language into any other language. The application uses machine learning algorithms to provide accurate translations, making it an ideal tool for people who deal with content in multiple languages.

Voice Generation

MediaPad can generate artificial voices in both male and female voices and in a range of accents, including foreign and Indian Hindi accents. This feature is useful for people who need to add voiceovers to their multimedia content or require artificial voices for their projects.

Voice to Text

The application also includes speech recognition capabilities, allowing users to convert spoken words into text. This feature is particularly useful for people who prefer to dictate their content rather than typing it.

Advanced Text Notepad

MediaPad comes with an advanced text notepad that allows users to create and edit text-based documents. This feature is ideal for people who need to work with text-based content like articles, reports, and more.

Text Translator

The application also provides a text translator that can translate text from any language into any other language. This feature uses machine learning algorithms to provide accurate translations, making it an ideal tool for people who deal with content in multiple languages.

Technologies Used

MediaPad was developed using a range of Python libraries, including:

  • Tkinter: The GUI library used to create the desktop application's interface.
  • SpeechRecognition: The library used for speech recognition capabilities.
  • gTTS: The Google Text-to-Speech library used to generate artificial voices.
  • pydub: The library used to manipulate audio files.
  • tkinter, PIL (Python Imaging Library), requests: A set of Python libraries for creating graphical user interfaces, working with images, and sending HTTP requests.
  • pyttsx3, speech_recognition: A set of Python libraries for text-to-speech conversion and speech recognition.
  • webbrowser, json: A set of Python libraries for opening web browsers and working with JSON data.
  • tkinter.scrolledtext, filedialog, simpledialog, messagebox, ttk: A set of Python libraries for creating and working with GUI widgets in tkinter.
  • googletrans, gtts: A set of Python libraries for text translation and text-to-speech conversion using Google services.
  • os, datetime: A set of Python libraries for working with the operating system and dates/times.
  • playsound, threading, winsound: A set of Python libraries for playing sound files and working with threads.
  • tktooltip: A Python library for creating tooltips in tkinter.

Getting Started

To use MediaPad, users need to download the application and install the necessary libraries. Once installed, they can start the application and begin using its various features. The application is easy to use and provides a user-friendly interface that makes it easy to navigate through different features.

Contributors

MediaPad was developed by Shahnawaz Alam (11202722) under the guidance of Rajeev Gupta Sir at MMDU, Mullana Ambala.

Version History

The current version of MediaPad is 3.0, released on June 11, 2023.

Sure, here's an updated version of the versioning documentation with the versions listed from top to bottom (most recent first):

Version 3.0 (June 2023)

  • Final release of the project with all major features and bug fixes.
  • Improved user experience and performance.
  • Date: June 11, 2023.

Version 2.1 (September 2022)

  • Fixed bugs related to the new text summarization feature.
  • Improved the accuracy of sentiment analysis.
  • Date: September 15, 2022.

Version 2.0 (August 2022)

  • Major update with new features including text summarization and sentiment analysis.
  • Improved user interface with new tabs and options.
  • Date: August 1, 2022.

Version 1.2 (June 2022)

  • Improved the text-to-speech feature with new voices and accents.
  • Fixed minor bugs related to the notepad feature.
  • Date: June 15, 2022.

Version 1.1 (April 2022)

  • Added support for additional languages in the text translation feature.
  • Fixed bugs related to text-to-speech and speech-to-text functionality.
  • Improved user interface with new language options and error messages.
  • Date: April 1, 2022.

Version 1.0 (February 2022)

  • First stable release of the project with all four major features.
  • Fixed minor bugs and improved user experience.
  • Date: February 10, 2022.

Version 0.4 (January 2022)

  • Added a notepad feature for users to save and edit text.
  • Improved user interface with a new tab for the notepad feature.
  • Tested and verified notepad functionality.
  • Date: January 15, 2022.

Version 0.3 (December 2021)

  • Added the ability to transcribe speech to text using the Google Cloud Speech-to-Text API.
  • Improved user interface with new buttons and options.
  • Tested and verified speech-to-text functionality.
  • Date: December 20, 2021.

Version 0.2 (November 2021)

  • Added the ability to convert text to speech using the Google Text-to-Speech API.
  • Improved user interface with new buttons and options.
  • Tested and verified text-to-speech functionality.
  • Date: November 15, 2021.

Version 0.1 (October 2021)

  • Added the ability to translate text from any language into another.
  • Basic user interface with text input and output fields.
  • Tested and verified basic functionality.
  • Date: October 10, 2021.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

About

An innovative Python project that pioneers multimedia transcription and introduces a multi-gender artificial voice with advanced text conversion capabilities

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published