Skip to content

Commit c3ad8bf

Browse files
Merge pull request avinashkranjan#750 from zaverisanya/master
POS removal from hindi text
2 parents 9eb20a3 + f99719f commit c3ad8bf

File tree

6 files changed

+937
-0
lines changed

6 files changed

+937
-0
lines changed

Remove_POS_hindi_text/Input.png

108 KB
Loading

Remove_POS_hindi_text/Only_Hindi.txt

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

Remove_POS_hindi_text/Output.png

181 KB
Loading

Remove_POS_hindi_text/README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Package/Script Name
2+
3+
Short description of package/script
4+
5+
-->Package installed- NLKT
6+
- NLTK stands for 'Natural Language Tool Kit'. It consists of the most common algorithms such as tokenizing, part-of-speech tagging, stemming, sentiment analysis, topic segmentation, and named entity recognition. NLTK helps the computer to analysis, preprocess, and understand the written text.
7+
8+
9+
## Setup instructions
10+
11+
--> Explanation on how to setup and run your package/script locally
12+
- simply import the NLKT package by writing 'import nlkt' in first line of your script.
13+
- To run the script locally save the 'Tagged_Hindi_Corpus.txt' file at your favourable location.
14+
- In code, in fp=open(r"..."), give the location of your saved file as mentioned in previous step.
15+
- In code, in fd=open(r"..."), give the location where you want the file with only Hindi text after removal of POS.
16+
- Note that for this script, I have run the script therefore only_hindi.txt file already exists. Before executing your script make sure you delete 'only_hindi.txt' file and see it after running the script.
17+
- Run the script with "python hindi_POS_tag_removal.py OR python <name of your py file.py>"
18+
- You will be able to see the file with only Hindi text.
19+
20+
21+
## Detailed explanation of script, if needed
22+
23+
Script is written as follows:
24+
25+
- Open the hindi_tagged_corpus file.
26+
- Data tokenization.
27+
- Create 2 empty lists.
28+
- To get all categories from POS.
29+
- To get all the hindi words.
30+
- To concatenate the words.
31+
- To write the words in only_hindi file.
32+
33+
## Input
34+
35+
![Image](C:\Users\ZAVERI SANYA\Desktop\Amazing-Python-Scripts\Remove_POS_hindi_text\Input.png)
36+
37+
## Output
38+
![Image](C:\Users\ZAVERI SANYA\Desktop\Amazing-Python-Scripts\Remove_POS_hindi_text\Output.png)
39+
40+
41+
## Author(s)
42+
43+
- This code is written by Sanya Devansh Zaveri. [https://github.com/zaverisanya]
44+
45+
## Disclaimers, if any
46+
47+
There are no disclaimers for this script.

0 commit comments

Comments
 (0)