# Natural Language Processing beginner's guide
author: Yusu Qian, New York University

Hi all, today I’m going to share with you some experience with getting started in natural language processing. Disclaimer: this is from someone still new to the field, and my experience only applies to those with little computer science background.

###What is NLP?

Natural language processing is the intersection of computer science and computational linguistics. It focuses on helping computers process and understand human languages. Some of the most popular tasks in natural language processing are question answering, sentiment analysis, summarization, and so on.

### How to det started with NLP?

Textbook

I started to become interested in this subfield of computer science last fall, when I just started learning to program. So I chose to build a foundation by reading introductory books like [Natural Language Processing with Python](https://www.nltk.org/book/). It is designed for those with literally no coding experience. The most famous textbook is probably [Speech and Language Processing](https://https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf), which you can find on Github. This book doesn’t teach you how to code. It covers theories in NLP broadly. I think reading the first half carefully is enough for beginners; you can always come back to the second half in the future when you are looking for something in particular. 

Courses

There are some online courses on NLP, including [CS224n](http://web.stanford.edu/class/cs224n/) from Stanford, probably the most famous NLP course, and many others on Coursera. In the Spring semester, CDS offers two NLP courses, Text as Data and Natural Language Understanding, both of which are open to CUSP students probably one or two weeks before the semester starts. So if you are interested in them, make sure to check in Albert everyday to see if the holds are lifted for students outside CDS. I’ll be grading NLU so looking forward to seeing you there. If the class at CDS is full, check out the same class by the linguistics department.

Packages, API, demos

If you are interested in doing a data science project that involves NLP but not a NLP project, you can check out APIs developed by tech companies, such as the Twitter API that allows you to collect realtime tweets, the Baidu API that allows you to do NLP tasks such as auto-correction with only a few lines of code, and many other APIs that you can easily find online and use for free. There are also some demo websites that allow you to have a taste without any coding, for example, the GPT-2 generator.

NLTK is the first package you'll come across as a beginner. Say you want to find all the words that appear with 'a xxx boy' in a text.

In [7]:
import nltk
from nltk.corpus import gutenberg, nps_chat
nltk.download()

NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> gutenberg
    Downloading package gutenberg to /root/nltk_data...
      Unzipping corpora/gutenberg.zip.

---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> q


True

In [0]:
moby=nltk.Text(gutenberg.words('melville-moby_dick.txt'))

In [9]:
moby.findall(r"<a>(<.*>)<boy>")

black


Below is an example of using Baidu API to analyze sentiment.

In [2]:
!pip install baidu-aip

Collecting baidu-aip
  Downloading https://files.pythonhosted.org/packages/bf/de/0e770c421bd70b0b59d59d1bcf70139cf0ad4263102a7fc2973c6187174a/baidu-aip-2.2.18.0.tar.gz
Building wheels for collected packages: baidu-aip
  Building wheel for baidu-aip (setup.py) ... [?25l[?25hdone
  Created wheel for baidu-aip: filename=baidu_aip-2.2.18.0-cp27-none-any.whl size=15655 sha256=5014f047302a005e0e2280a61a3a89a4f4d5c3b6311a46629ffebe277c57cfca
  Stored in directory: /root/.cache/pip/wheels/5e/f3/20/9567d96b1140f13546bb3e059827cba0d575e213e8ee87f5ea
Successfully built baidu-aip
Installing collected packages: baidu-aip
Successfully installed baidu-aip-2.2.18.0


In [0]:
from aip import AipNlp
APP_ID = ''
API_KEY = ''
SECRET_KEY = ''

client = AipNlp(APP_ID, API_KEY, SECRET_KEY)

In [5]:
text = 'I am interested in this workshop.'
client.sentimentClassify(text)

{u'items': [{u'confidence': 0.115521,
   u'negative_prob': 0.455776,
   u'positive_prob': 0.544224,
   u'sentiment': 1}],
 u'log_id': 5766633508630618157,
 u'text': u'I am interested in this workshop.'}

Baidu API: Maybe this is positive, but I'm not sure.

AllenNLP is useful: https://demo.allennlp.org/dependency-parsing

You can use code released by OpenAI to play with GPT-2, and fine-tune it for your purposes. For simplicity I'm showing you an online demo: https://talktotransformer.com/

# Research

If you are into research, you should definitely take Natural Language Understanding next semester. If you can’t enroll in it, you can always audit it online as it is streamed online. Please contact the TAs for more information in the first class. NLU guides students through their first research in NLP, from how to come up with a topic, to literature reading, experiment designing, and finally paper writing. Students are asked to form teams of three or four to write a paper and present it to the whole class. If you are interested in reading some papers in this field, you can go to Arxiv, ACL anthology, and checkout the websites of workshops if you are interested in any specific task.

For students without a strong computer science background, I’d generally discourage against approaching a faculty at CS or CDS for a research collaboration, unless you have some really cool idea. If you have a solid foundation, keep an eye on Wasserman job postings as sometimes opportunities to participate in a research will be posted there, or you may directly discuss with a faculty member. Don’t be discouraged if you are rejected, as it happens all the time.

## Where to publish a paper?

When you have a paper either by taking a class that requires a course paper or by collaborating with other students or researchers, you can submit it to conferences or put it on Arxiv. Notice that if you put it on [Arxiv](https://arxiv.org/) with your names, generally you can’t submit it to a conference later as it violates the rules. Submitting is free, and grants are sometimes available to student attendees. There are also summits held by tech companies such as Facebook. Those are mostly non-archival and more of a chance to meet new people and share ideas. These are great opportunities for both who are seeking job opportunities and who are interested in getting to know more researchers and their works.