# The role of vocabulary acquisition in English learning
## Academic Writing - Final Project

### Introduction

In this piece of work, I am going to focus on the role of vocabulary acquistion in English learning. I am going to answer the question if vocabulary is important in learning languages. However, the main aim of this essay is to practice using such tools as [Natural Language Toolkit (NLTK)](https://www.nltk.org/), [BeautifulSoup library](https://www.dataquest.io/blog/web-scraping-python-using-beautiful-soup/) to perform web scraping, and obviously, Python 3. At some point this essay might regard to linguistic analysis in particular, rather than to the topic of the role of vocabulary in learning languages. In this essay, the second language acquisition is mostly taken into account, since native speakers of a given language learn new vocabulary differently than foreign language learners. 


### Chapter 1. 

At first glance, it can be said that the question posed in the title is abstract. It is, indeed. As David A. Wilkins, British linguists, said:

>without grammar, very little can be conveyed; 
>without vocabulary nothing can be conveyed.

Therefore, without vocabulary, there is no language. It is not posibble to even imagine saying that you know a language, when actually you do not know any word. __Vocabulary the foundation of a language__. One could say that grammar is the most important department in learning, however, let's come back to the quote of Wilkins. We can communicate even if we just know a few words and have no understanding of grammar. 

To learn new vocabulary, we can choose various ways. Writing, reading, listening - basic comprehensive skills. Learners are free to choose whichever form they prefer to achieve their goal faster and more effectively. Let's look at some examples, showing the differences in efectiveness between aforementioned skills.


In [20]:
import matplotlib.pyplot as plt
%matplotlib notebook
slices = [27, 30, 43]
titles = 'Listening', 'Writing', 'Reading'
explode = (0, 0, 0.1)
color = ['m', 'b', 'g']
plt.title("The best way to enlarge your vocabulary")
plt.pie(slices, labels = titles, startangle = 140, shadow = True, explode = explode, autopct = '%1.1f%%')
plt.show()

<IPython.core.display.Javascript object>

Therefore, we can see from this pie chart that reading is the most beneficial and the fastest way to improve your vocabulary. It does not matter, if you are a beginner or you are already an advanced learner; while reading, you come across various words. You have a few options here:
1. You look up the words you do not know in a dictionary.
2. You guess the meaning from the context.
3. You can just give up and find something else to read.

No matter which option you choose (but let's skip the last one), you are still able to memorise those words, given you are working with them, you see them and you focus on them for a while. From [TextInspector](https://textinspector.com/vocabulary-in-language-learning/) we find out, that:
> This connection between vocabulary size and second language attainment has been widely researched over the years. One of the most interesting of these was a 2010 study that discovered that a surprising 64% of variance in the reading score was due to vocabulary size.

Not only can reading benefit you, but also entertain you. 

Moving on to another skill - writing. It is a way to practise your vocabulary and use words you have previously learnt. What is more, a larger vocabulary helps you express yourself more precisely. Nonetheless, as the least beneficial way to improve your vocabulary is listening. Theoretically, it is debatable. The problem with listening is the fact that considering you cannot see a particular piece of vocabulary, it is harder to recognise and decode the sounds, especially when you are at the very beginning of your adventure with a foreign language. Most people are visualisers and it does not make extracting the meaning any easier. 

### Chapter 2.

Since we have discussed the possible ways to enlarge our vocabulary, let's start with some analysis. In this part I am going to analyse the article [*Learning Styles and Vocabulary Acquisition in Second Language: How the Brain Learns*](https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01800/full#B23), written by Manuela Macedonia. I will be analysing which words are the most frequent in the article, POS-tagging, how many words each paragraph consists of. Basically, the main aim of this chapter is to work thorough natural language processing toolkits. 

In [21]:
import requests
page = requests.get("https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01800/full#B23")

In [22]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

In [24]:
paragraphs = soup.find_all('p')

In [25]:
import nltk

In [76]:
for parts in paragraphs:
    text = parts.get_text().strip()
    if len(text) > 0:
        print("Each paragraph after web scraping: ", text)

Each paragraph after web scraping:  Impact Factor 2.067 | CiteScore 3.2More on impact ›
Each paragraph after web scraping:  University of Navarra, Spain
Each paragraph after web scraping:  University of Groningen, Netherlands
Each paragraph after web scraping:  The editor and reviewers' affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.
Each paragraph after web scraping:  Suggest a Research Topic >
Each paragraph after web scraping:  Suggest a Research Topic >
Each paragraph after web scraping:  In recent years, foreign language education has been focussing on learning styles. However, despite the quantity of articles and practice books, websites on the topic, and investment in teacher training, there is no empirical evidence for the existence of learning styles. Furthermore, if one agrees that it is the brain that learns, there should be indicators in the brain for the existence of learning styles, anatomical

In [78]:
paragraphs = soup.find_all('p')

for parts in paragraphs:
    text = parts.get_text().strip()
    fd = nltk.FreqDist(text)
    if len(text) > 0:
        print("The most common words in each paragraph:" , fd.most_common(10))

The most common words in each paragraph: [(' ', 8), ('c', 4), ('t', 4), ('o', 4), ('a', 3), ('r', 3), ('e', 3), ('m', 2), ('p', 2), ('2', 2)]
The most common words in each paragraph: [('a', 4), ('i', 3), ('r', 3), (' ', 3), ('n', 2), ('v', 2), ('U', 1), ('e', 1), ('s', 1), ('t', 1)]
The most common words in each paragraph: [('n', 5), ('e', 4), ('i', 3), ('r', 3), (' ', 3), ('s', 2), ('t', 2), ('o', 2), ('U', 1), ('v', 1)]
The most common words in each paragraph: [(' ', 24), ('e', 20), ('t', 14), ('i', 13), ('r', 12), ('o', 10), ('a', 10), ('h', 6), ('n', 6), ('s', 6)]
The most common words in each paragraph: [(' ', 4), ('e', 3), ('g', 2), ('s', 2), ('a', 2), ('c', 2), ('S', 1), ('u', 1), ('t', 1), ('R', 1)]
The most common words in each paragraph: [(' ', 4), ('e', 3), ('g', 2), ('s', 2), ('a', 2), ('c', 2), ('S', 1), ('u', 1), ('t', 1), ('R', 1)]
The most common words in each paragraph: [(' ', 185), ('e', 132), ('t', 93), ('i', 90), ('n', 86), ('a', 80), ('s', 72), ('o', 67), ('r', 65)

In [81]:
paragraphs = soup.find_all('p')

for parts in paragraphs:
    text = parts.get_text().strip()
    if len(text) > 0:
        tokens = nltk.word_tokenize(text)
        tagged = nltk.pos_tag(tokens)
        print("POS-tagged paragraph: ", tagged)
        

POS-tagged paragraph:  [('Impact', 'NNP'), ('Factor', 'NNP'), ('2.067', 'CD'), ('|', 'NNP'), ('CiteScore', 'NNP'), ('3.2More', 'CD'), ('on', 'IN'), ('impact', 'NN'), ('›', 'NN')]
POS-tagged paragraph:  [('University', 'NNP'), ('of', 'IN'), ('Navarra', 'NNP'), (',', ','), ('Spain', 'NNP')]
POS-tagged paragraph:  [('University', 'NNP'), ('of', 'IN'), ('Groningen', 'NNP'), (',', ','), ('Netherlands', 'NNP')]
POS-tagged paragraph:  [('The', 'DT'), ('editor', 'NN'), ('and', 'CC'), ('reviewers', 'NNS'), ("'", 'POS'), ('affiliations', 'NNS'), ('are', 'VBP'), ('the', 'DT'), ('latest', 'JJS'), ('provided', 'VBN'), ('on', 'IN'), ('their', 'PRP$'), ('Loop', 'NNP'), ('research', 'NN'), ('profiles', 'NNS'), ('and', 'CC'), ('may', 'MD'), ('not', 'RB'), ('reflect', 'VB'), ('their', 'PRP$'), ('situation', 'NN'), ('at', 'IN'), ('the', 'DT'), ('time', 'NN'), ('of', 'IN'), ('review', 'NN'), ('.', '.')]
POS-tagged paragraph:  [('Suggest', 'NNP'), ('a', 'DT'), ('Research', 'NNP'), ('Topic', 'NNP'), ('>', '

In [89]:
paragraphs = soup.find_all('p')

from nltk.tokenize import WordPunctTokenizer
tokenizer = WordPunctTokenizer()
for parts in paragraphs:
    text = parts.get_text().strip()
    if len(text) > 0:
        print("The number of words in each paragraph: ", len(tokenizer.tokenize(text)))

The number of words in each paragraph:  13
The number of words in each paragraph:  5
The number of words in each paragraph:  5
The number of words in each paragraph:  27
The number of words in each paragraph:  5
The number of words in each paragraph:  5
The number of words in each paragraph:  220
The number of words in each paragraph:  573
The number of words in each paragraph:  325
The number of words in each paragraph:  48
The number of words in each paragraph:  133
The number of words in each paragraph:  172
The number of words in each paragraph:  412
The number of words in each paragraph:  354
The number of words in each paragraph:  137
The number of words in each paragraph:  28
The number of words in each paragraph:  41
The number of words in each paragraph:  6
The number of words in each paragraph:  72
The number of words in each paragraph:  9
The number of words in each paragraph:  40
The number of words in each paragraph:  9
The number of words in each paragraph:  59
The number

### Chapter 3.

From the previous part of this work, you can find the information on linguistically-analysed text from a website. How come is it supposed to be helpful, you may ask. For the sake of this essay, meaning the role of vocabulary acquisition in English learning, we are able to indicate a few applications:

1. From extracting the most common words in each paragraph we can distinguish those, we do not know and check them out (as it was mentioned in Chapter 1.).
2. The same situation occurs in POS-tagging analysis, we can see examples of words, distinguish those we do know know, but here we can simultaneously the information on a part of speech in a sentence. From that, we can guess the meaning of a word easier from the context, given we already know if it is a noun, a verb or an adjective.

Furthermore, the analysis confirms the data displayed in the pie chart. Reading and analysing sentences help us learn new words or just extends our current vocabulary to sound more like *native speaker*. 

