Skip to content

sicsr-lab/python-package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Description

This project is about creating a hindi dataset cleaning python package.It will be a command line based solution to pre-process hindi dataset. The abilities of this package will include-

  • pre-processing given file into hindi characters only.
  • splitting paragraphs into sentences
  • removal of punctuations from the dataset if required.

Technologies In Use

  • Python
  • Data Science

Number of member/s required: 2

Start Date: 11-08-2020

Expected Deadline: 20-09-2020

Contributors

Releases

No releases published

Packages

No packages published

Languages