<a href="https://colab.research.google.com/github/oshinanika/BDJOBS_SCRAPER-Selenium_BeautifulSoup/blob/master/Simple_NLP_App_She_Thinks_Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Install Spacy, Streamlit & Pyngrok

In [None]:
!pip install spacy

In [None]:
!pip install spacy-streamlit

In [None]:
!python3 -m spacy download en

In [None]:
!pip install pyngrok

###Save your file from PC to this colab

In [None]:
%%writefile app.py
#Core packages
import streamlit as st

#NLP packages
import spacy 
import spacy_streamlit as ss 
from spacy import displacy
nlp = spacy.load('en')


def main():
  "NLP app with Streamlit"
  st.title("NLP APP")
  menu = ["Home", "Tokens", "NER", "Query"]
  choice = st.sidebar.selectbox("Menu", menu)

  if choice == "Home":
    st.subheader("Visualizer")
    raw_text = st.text_area("Enter text here")
    docx = nlp(raw_text)
    if st.button("Visualize"):
      ss.visualize_parser(docx)
  

  elif choice == "Tokens":
    st.subheader("Tokenizer")
    raw_text = st.text_area("Enter text here"," ")
    docx = nlp(raw_text)
    if st.button("Tokenize"):
      ss.visualize_tokens(docx, attrs=['idx', 'text', 'pos_', 'lemma_', 'like_num'])

  elif choice == "NER":
    st.subheader("Named Entity Recognizer")
    raw_text = st.text_area("Enter text here"," ")
    docx = nlp(raw_text)
    if st.button("NER"):
      ss.visualize_ner(docx, labels=nlp.get_pipe('ner').labels)

  elif choice == "Query":
    st.subheader("Query on Spacy")
    raw_text = st.text_area("Enter text here"," ")
    docx = nlp(raw_text)
    answer = spacy.explain(raw_text)
    if st.button("Explain"):
      st.write(answer)


if __name__ == '__main__':
  main()

Overwriting app.py


In [None]:
#check if file is saved
!ls

app.py	sample_data


##Spacy Doc
https://spacy.io/usage/spacy-101


Streamlit DOC
https://docs.streamlit.io/en/stable/api.html

### Spacy-Streamlit
https://github.com/explosion/spacy-streamlit


### Viualize_tokens 
https://spacy.io/api/token#attributes
  + idx = The character offset of the token within the parent document.(index of the start character of the word)
  + text = Verbatim text content.
  + lemma_ = Base form of the token, with no inflectional suffixes.
  + pos_ = Coarse-grained part-of-speech from the Universal POS tag set.(https://universaldependencies.org/docs/u/pos/) ->Noun, verb, adjective, ...
  + tag_ = Fine-grained part-of-speech(https://github.com/explosion/spaCy/blob/master/spacy/lang/en/tag_map.py).->verb-past, verb-present-3rd, verb-base, ... etc.
  + dep_ = Syntactic dependency relation.
  + head = The syntactic parent, or “governor”, of this token.
  + ent_type_ = Named entity type.
  + ent_iob_ = IOB code of named entity tag. “B” means the token begins an entity, “I” means it is inside an entity, “O” means it is outside an entity, and "" means no entity tag is set.
  + shape_ = Transform of the tokens’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd".
  + is_alpha, is_digit, is_ascii, is_punct, 
  + like_num  = Does the token represent a number? e.g. “10.9”, “10”, “ten”, etc.

### Visualize_ner
https://spacy.io/usage/processing-pipelines

* geo = Geographical Entity
* gpe = Geopolitical Entity
* tim = Time indicator
* eve = Event
* nat = Natural Phenomenon




### Dependency Parsing & POS tags
https://spacy.io/api/annotation

### Authenticate ngrok from your profile dashboard on their website

In [None]:
!ngrok authtoken XXXXXXXXX

Authtoken saved to configuration file: /root/.ngrok2/ngrok.yml


In [None]:
!ngrok

### Run Streamlit by any one method from below

In [None]:
#This will make the app run in background while you update

#!nohup streamlit run app.py

!streamlit run app.py&>/dev/null&

### Check if Streamlit is running
pgrep looks through the currently running processes and lists the process IDs

In [None]:
!pgrep streamlit

544
558


### Public URL for our App

In [None]:
from pyngrok import ngrok

In [None]:
publ_url = ngrok.connect(port='8501')

In [None]:
publ_url

'http://d56d921885c1.ngrok.io'

### Kill this tunnel
(must be killed in this same colab notebook)

In [None]:
ngrok.kill()