##### Information from : https://ithelp.ithome.com.tw/articles/10261285

## NLP Web

為了運用正則表達式來製造pattern，我們先引入模組 re 。這時候我們使用 re.sub() 這個函式，並且傳遞三個必要引數(required arguments)：
- pattern: 正則表達式，在這裡我們可以設計為 r"<.*?>"
- replacement_text: 符合pattern的字串將被更換為之，在這裡直接換成空字串 ''
- input: 待比對之字串

In [1]:
import re
from nltk.tokenize import sent_tokenize  #NLTK工具箱 sent_tokenize() 用來實現斷句

In [2]:
raw_text = """
<html>
   <head>
      <title>My Garden - Tomatoes</title>
   </head>
   <body>
   <h1>Garden Tomatoes</h1>
   <p>I decided to plant some tomatoes this spring. They're really taking off and I hope to have lots of tomatoes to give to all my friends and family this summer!</p>
   <p>Here are a few things I like about tomatoes:</p>
   <ol>
      <li>They taste great.</li>
      <li>They're good for me.</li>
      <li>They're easy to grow!</li>
   </ol>
   <p>Here's a picture of my garden:</p>
   <img src="http://www.mygardensite.com/images/my-garden-001.jpg" alt="a picture of my garden" />
   <p>Here's a <a href="http://www.welovetomatoes.com">link</a> to check out more interesting things about tomatoes!</p>
   </body>
</html>
"""


text_no_tags = re.sub(r"<.*?>", '', raw_text)
print(text_no_tags)



   
      My Garden - Tomatoes
   
   
   Garden Tomatoes
   I decided to plant some tomatoes this spring. They're really taking off and I hope to have lots of tomatoes to give to all my friends and family this summer!
   Here are a few things I like about tomatoes:
   
      They taste great.
      They're good for me.
      They're easy to grow!
   
   Here's a picture of my garden:
   
   Here's a link to check out more interesting things about tomatoes!
   




## Cleaning the black
使用代表 whitspace、tab、換行的元字元(metacharacter) \s。由於無意義空格佔了兩個半格以上的空間，因此pattern可以設計為 \s{2,}，程式碼如下：

In [53]:
# to remove redundant whitespaces
text_no_whitespace = re.sub(r"\s{2,}", ' ', text_no_tags)
text_no_whitespace
type(text_no_whitespace)

str

## Sentence Segmenation 斷句

在 Python 的實踐上，我們使用自然語言處理工具箱 NLTK (NLP Toolkit) 來協助我們進行處理任務。第一步我們欲將以上的字串拆分成多個句子，此步驟稱之為斷句（Sentence Segmentation）。句號（.）是判斷句子結束很好的依據，但仍有些例外－省略用的句號，如 Mr. Williams、 Ph.D. 等。 好消息是， NLTK工具箱當中的 tokenize 模組，已經定義好了函式 sent_tokenize() 用來實現斷句：

In [52]:
# removing double quotes
text = re.sub(r"\"", '', text_no_whitespace)

# breaking text into sentences
text_sentences = sent_tokenize(text)
print(type(text_sentences))

# printing out sentences
for i, sent in enumerate(text_sentences):
    print("Sentence {}: {}".format(i + 1, sent), end = "\n\n")

<class 'list'>
Sentence 1: Facebook under fire over secret teen research
By Jane Wakefield
Technology reporter Published15 September 2021
Girl taking a selfie
IMAGE SOURCE,GETTY IMAGES
Image caption,
Teenage girls can be very conscious of body image - and Instagram can make them feel worse, the internal studies showed
Facebook-owned Instagram has been criticised for keeping secret its internal research into the effect social media had on teenager users.

Sentence 2: According to the Wall Street Journal, its studies showed teenagers blamed Instagram for increased levels of anxiety and depression.

Sentence 3: Campaign groups and MPs have said it is proof the company puts profit first.

Sentence 4: Instagram said the research showed its commitment to understanding complex and difficult issues.

Sentence 5: The Wall Street Journal's report, not disputed by Facebook, finds: A 2019 presentation slide said: We make body-image issues worse for one in three teenage girls
Another slide said tee

## Tokenisation 斷詞
進一步將句子拆分成更小的單位－單詞。值得注意的是，在英文當中單詞通常被認為能夠表示意義的最小單位－詞條（Token），將字串拆分成詞條的過程就是斷詞（word segementation），又稱記號化（Tokenisation）。此詞我們引入另一個拆分函式 word_tokenize() ：

In [5]:
#List
from nltk.tokenize import word_tokenize

for i, sent in enumerate(text_sentences):
    print("Sentence {}: {}".format(i + 1, sent))
    tokens = word_tokenize(sent)
    print(tokens, end = "\n\n")

Sentence 1:  My Garden - Tomatoes Garden Tomatoes I decided to plant some tomatoes this spring.
['My', 'Garden', '-', 'Tomatoes', 'Garden', 'Tomatoes', 'I', 'decided', 'to', 'plant', 'some', 'tomatoes', 'this', 'spring', '.']

Sentence 2: They're really taking off and I hope to have lots of tomatoes to give to all my friends and family this summer!
['They', "'re", 'really', 'taking', 'off', 'and', 'I', 'hope', 'to', 'have', 'lots', 'of', 'tomatoes', 'to', 'give', 'to', 'all', 'my', 'friends', 'and', 'family', 'this', 'summer', '!']

Sentence 3: Here are a few things I like about tomatoes: They taste great.
['Here', 'are', 'a', 'few', 'things', 'I', 'like', 'about', 'tomatoes', ':', 'They', 'taste', 'great', '.']

Sentence 4: They're good for me.
['They', "'re", 'good', 'for', 'me', '.']

Sentence 5: They're easy to grow!
['They', "'re", 'easy', 'to', 'grow', '!']

Sentence 6: Here's a picture of my garden: Here's a link to check out more interesting things about tomatoes!
['Here', "'s"

In [55]:
#String
sentence_data = "The First sentence is about Python. The Second: about Django. You can learn Python, \
Django and Data Ananlysis here. "

nltk_tokens = nltk.sent_tokenize(sentence_data)
print (nltk_tokens)

['The First sentence is about Python.', 'The Second: about Django.', 'You can learn Python, Django and Data Ananlysis here.']


### token.lower() 大小寫轉換

In [6]:
tokenised = ["The", "spectators", "all", "stood", "and", "sang", "the", "national", "anthem"]
# lowercasing each token
tokens_lower = [token.lower() for token in tokenised] 
tokens_lower

['the',
 'spectators',
 'all',
 'stood',
 'and',
 'sang',
 'the',
 'national',
 'anthem']

## Stemming 語幹提取
在語言學中，詞幹（word stem）表示一個單詞中最基本且核心的形式，例如 friendships 就是由 friendship 與詞綴 -s 所組成， friendship 就是其詞幹；而 friendship 則是由 friend 與詞綴 -ship 所構成，此時 friend 則是其詞幹。因此詞幹的提取基於不同理念或不同演算法，有時會得到不同的結果。我們以常見的 Porter Stemming Algorithm、 Lancaster Stemming Algorithm 以及 Snowball Stemming Algorithm 說明，從而比較它們的差異。

In [7]:
# importing stemmer classes
from nltk.stem import PorterStemmer, LancasterStemmer, SnowballStemmer

tokens = ["the", "spectators", "all", "stood", "and", "sang", "the", "national", "anthem"]

# stemming
port = PorterStemmer()
stemmed_port = [port.stem(token) for token in tokens]

lanc = LancasterStemmer()
stemmed_lanc = [lanc.stem(token) for token in tokens]

snow = SnowballStemmer("english")
stemmed_snow = [snow.stem(token) for token in tokens]

# showing stemmed results
print("Porter: {}".format(stemmed_port)) 
print("Lancaster: {}".format(stemmed_lanc))
print("Snowball: {}".format(stemmed_snow))

Porter: ['the', 'spectat', 'all', 'stood', 'and', 'sang', 'the', 'nation', 'anthem']
Lancaster: ['the', 'spect', 'al', 'stood', 'and', 'sang', 'the', 'nat', 'anthem']
Snowball: ['the', 'spectat', 'all', 'stood', 'and', 'sang', 'the', 'nation', 'anthem']


## 詞形還原（Lemmatisation）
很顯然，萃取詞幹並未能滿足我們減少詞形變化（inflection）的需求，因此我們轉而找尋更能代表單詞基本形式－詞位（lemma），例如 sings、 singing、 sang、 sung 共享同一個詞位 sing。以下我們將借用 NLTK.stem 模組中收錄的 WordNetLemmatizer 類別找出詞位，WordNet為普林斯頓大學所建立的免費公開詞彙資料庫。

In [8]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/jerrychien/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [9]:
from nltk.stem import WordNetLemmatizer

tokens = ["the", "spectators", "all", "stood", "and", "sang", "the", "national", "anthem"]

lemmatiser = WordNetLemmatizer()
lemmatised = [lemmatiser.lemmatize(token) for token in tokens]
print("lemmatised: {}".format(lemmatised))

lemmatised: ['the', 'spectator', 'all', 'stood', 'and', 'sang', 'the', 'national', 'anthem']


## 停用詞去除（Stopword Removal）
在文句中有些單詞並對於詞義的傳達並無太大的作用，如 a/ an、 the 、 is/ are等，被稱之為停用詞（ stop words）。

In [10]:
from nltk.corpus import stopwords
nltk.download("stopwords")

# defining stopwords in English
stop_words = set(stopwords.words("english"))

# removing stop words
words_no_stop = [word for word in lemmatised if word not in stop_words]
print("stop words removed: {}".format(words_no_stop))

stop words removed: ['spectator', 'stood', 'sang', 'national', 'anthem']


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/jerrychien/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## 詞性標註（POS Tagging）
詞性（Part-of-Speech, POS）與語法分析（Syntactic Analysis）
在語言學中，單詞被依照其功能以及詞形變化（inflection）分類為不同的詞性（Part of Speech, POS）。常見的詞性包含了名詞、動詞、形容詞、副詞、介係詞等等，如「 In God we trust. 」這句英文就由介係詞（ in ） + 名詞（ God ） + 代名詞（ we ）+ 動詞（ trust ） 所依序構成，其句法（syntax）有別於由代名詞、動詞、介係詞、名詞依序構成的「 We trust in God. 」。我們將以詞性作為出發點，依循文法規則，進而分析文句的架構，這個過程稱為語法分析（syntactic analysis）。
- https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

In [11]:
from nltk import pos_tag
nltk.download("averaged_perceptron_tagger")

tokenised_sent = ["their", "decision", "makes", "no", "economic", "sense"]

# POS tagging
pos_tagged_sent = pos_tag(tokenised_sent)
print("POS tagged sentence:\n{}".format(pos_tagged_sent))

POS tagged sentence:
[('their', 'PRP$'), ('decision', 'NN'), ('makes', 'VBZ'), ('no', 'DT'), ('economic', 'JJ'), ('sense', 'NN')]


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/jerrychien/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


## 語義組塊（Phrase Chunking）
句構的層次描述可以很簡單，也可以很複雜，取決於我們如何「分塊（ chunking ）」。我們可以依照片語或子句的文法結構指定語義組塊，透過語法剖析器（ parser ）逐步檢查語法（使用正則表達式比對字串），從而產生描述層次結構的分析樹。我們將示範以名詞片語以及動詞片語兩個簡單的文法結構來實踐分塊：

In [12]:
from nltk import RegexpParser

In [13]:
# 名詞片語 given a word tokenised sentence
tokenised_sent = ["their", "decision", "makes", "no", "economic", "sense"]

# POS tagging
pos_tagged_sent = pos_tag(tokenised_sent)

# specifying the formal grammar of an noun phrase: "grammar_name: {RegEx}"
np_chunk_grammar = "NP: {<DT>?<JJ>*<NN.?>}"
# building its parser
np_chunk_parser = RegexpParser(np_chunk_grammar)
# chunk parsing a sentence
np_chunked_sent = np_chunk_parser.parse(pos_tagged_sent)

# visualising parsing result
np_chunked_sent.draw()

## 應用實例：文章資訊檢索
介紹完了文法解析之後，我們接下來瀏覽一篇新聞報導，藉由一系列前處理、詞性標籤以及語塊分析的技巧，找出文本中的關鍵資訊。
我們預先寫好兩個模組：tokenise_words.py 以及 chunk_counters.py
以下為 tokenise_words 模組：將清理過的字串進行斷句與斷詞（小寫轉換 → 斷句 → 斷詞）

## 詞袋模型（Bag-of-Words Model, BoW）
淺談詞「袋」
詞袋模型是一個基於單詞出現頻率來表示文字的方法，它並不考慮單詞的排列順序、或甚至是文法結構。

In [20]:
# Define training docs
training_docs = ["Five fantastic fish flew off to find faraway functions.",
                 "Maybe find another five fantastic fish?",
                 "Find my fish with a function please!"]

# features dictionary ：

# merge a list of string to a single string
merged = ' '.join(training_docs)
# Stop word removal, tokenisation and lemmatisation
tokens = preprocess_text(merged)

features_dict = dict()
index = 0
for token in tokens:
    print("token: {}".format(token))
    # if not a new word
    if token in features_dict:
        continue
    else:
        features_dict[token] = index
        index += 1


NameError: name 'preprocess_text' is not defined

## 語言模型（ language model, LM ）
就是賦予一段文句機率值。在自然語言處理的許多情境中皆仰賴語言模型：

- 拼字檢查（ Spell Correction ）：
P("I spent five minutes reading the article.") > P("I spent five minutes readnig the article.")
- 語音辨別（ Speech Recognition ）：
P("I saw a van.") >> P("eyes awe of an")
- 文字預測（ Text Prediction ）：
P("Do you want to go to the store") > P("Do you want to go to the gym") > P("Do you want to go to the bank")

## N元語法語言模型（N-Gram Language Models）
名詞定義： n-gram 就是n個連續單詞構成的序列，例如 "store" 就是 1-gram、 "the store" 就是 2-gram、 "to the store" 就是 3-gram ，以此類推。
假設我們輸入 "Do you want to go to the" ，語言模型就會根據背後的演算法來預測下一個字將會是什麼？考慮一種可能的情況： "Do you want to go to the store" 。語言模型將會計算條件機率 P("store" | "Do you want to go to the") ，如下所示：

In [22]:
from collections import Counter
from nltk.util import ngrams
#from preprocess_text import preprocess, clean_text # user-defined functions

In [44]:
# Load the news article as raw text data
with open("input/NLP-BBC.txt", 'r') as f:
    raw_news = f.read()

text_no_tags = re.sub(r"<.*?>", '', raw_news)
text_no_whitespace = re.sub(r"\s{2,}", ' ', text_no_tags)

In [45]:
from nltk.tokenize import word_tokenize

for i, sent in enumerate(text_no_whitespace):
    print("Sentence {}: {}".format(i + 1, sent))
    tokens = word_tokenize(sent)
    print(tokens, end = "\n\n")

Sentence 1: F
['F']

Sentence 2: a
['a']

Sentence 3: c
['c']

Sentence 4: e
['e']

Sentence 5: b
['b']

Sentence 6: o
['o']

Sentence 7: o
['o']

Sentence 8: k
['k']

Sentence 9:  
[]

Sentence 10: u
['u']

Sentence 11: n
['n']

Sentence 12: d
['d']

Sentence 13: e
['e']

Sentence 14: r
['r']

Sentence 15:  
[]

Sentence 16: f
['f']

Sentence 17: i
['i']

Sentence 18: r
['r']

Sentence 19: e
['e']

Sentence 20:  
[]

Sentence 21: o
['o']

Sentence 22: v
['v']

Sentence 23: e
['e']

Sentence 24: r
['r']

Sentence 25:  
[]

Sentence 26: s
['s']

Sentence 27: e
['e']

Sentence 28: c
['c']

Sentence 29: r
['r']

Sentence 30: e
['e']

Sentence 31: t
['t']

Sentence 32:  
[]

Sentence 33: t
['t']

Sentence 34: e
['e']

Sentence 35: e
['e']

Sentence 36: n
['n']

Sentence 37:  
[]

Sentence 38: r
['r']

Sentence 39: e
['e']

Sentence 40: s
['s']

Sentence 41: e
['e']

Sentence 42: a
['a']

Sentence 43: r
['r']

Sentence 44: c
['c']

Sentence 45: h
['h']

Sentence 46: 

[]

Sentence 47: B
['B

['2']

Sentence 1003: 0
['0']

Sentence 1004: ,
[',']

Sentence 1005:  
[]

Sentence 1006: r
['r']

Sentence 1007: e
['e']

Sentence 1008: s
['s']

Sentence 1009: e
['e']

Sentence 1010: a
['a']

Sentence 1011: r
['r']

Sentence 1012: c
['c']

Sentence 1013: h
['h']

Sentence 1014:  
[]

Sentence 1015: f
['f']

Sentence 1016: o
['o']

Sentence 1017: u
['u']

Sentence 1018: n
['n']

Sentence 1019: d
['d']

Sentence 1020:  
[]

Sentence 1021: 3
['3']

Sentence 1022: 2
['2']

Sentence 1023: %
['%']

Sentence 1024:  
[]

Sentence 1025: o
['o']

Sentence 1026: f
['f']

Sentence 1027:  
[]

Sentence 1028: t
['t']

Sentence 1029: e
['e']

Sentence 1030: e
['e']

Sentence 1031: n
['n']

Sentence 1032: a
['a']

Sentence 1033: g
['g']

Sentence 1034: e
['e']

Sentence 1035:  
[]

Sentence 1036: g
['g']

Sentence 1037: i
['i']

Sentence 1038: r
['r']

Sentence 1039: l
['l']

Sentence 1040: s
['s']

Sentence 1041:  
[]

Sentence 1042: s
['s']

Sentence 1043: u
['u']

Sentence 1044: r
['r']

Senten

[]

Sentence 2003: I
['I']

Sentence 2004: n
['n']

Sentence 2005: s
['s']

Sentence 2006: t
['t']

Sentence 2007: a
['a']

Sentence 2008: g
['g']

Sentence 2009: r
['r']

Sentence 2010: a
['a']

Sentence 2011: m
['m']

Sentence 2012:  
[]

Sentence 2013: a
['a']

Sentence 2014:  
[]

Sentence 2015: s
['s']

Sentence 2016: a
['a']

Sentence 2017: f
['f']

Sentence 2018: e
['e']

Sentence 2019:  
[]

Sentence 2020: a
['a']

Sentence 2021: n
['n']

Sentence 2022: d
['d']

Sentence 2023:  
[]

Sentence 2024: s
['s']

Sentence 2025: u
['u']

Sentence 2026: p
['p']

Sentence 2027: p
['p']

Sentence 2028: o
['o']

Sentence 2029: r
['r']

Sentence 2030: t
['t']

Sentence 2031: i
['i']

Sentence 2032: v
['v']

Sentence 2033: e
['e']

Sentence 2034:  
[]

Sentence 2035: p
['p']

Sentence 2036: l
['l']

Sentence 2037: a
['a']

Sentence 2038: c
['c']

Sentence 2039: e
['e']

Sentence 2040:  
[]

Sentence 2041: f
['f']

Sentence 2042: o
['o']

Sentence 2043: r
['r']

Sentence 2044:  
[]

Sentence 

Sentence 3002: l
['l']

Sentence 3003: i
['i']

Sentence 3004: a
['a']

Sentence 3005: m
['m']

Sentence 3006: e
['e']

Sentence 3007: n
['n']

Sentence 3008: t
['t']

Sentence 3009: a
['a']

Sentence 3010: r
['r']

Sentence 3011: y
['y']

Sentence 3012:  
[]

Sentence 3013: c
['c']

Sentence 3014: o
['o']

Sentence 3015: m
['m']

Sentence 3016: m
['m']

Sentence 3017: i
['i']

Sentence 3018: t
['t']

Sentence 3019: t
['t']

Sentence 3020: e
['e']

Sentence 3021: e
['e']

Sentence 3022:  
[]

Sentence 3023: l
['l']

Sentence 3024: o
['o']

Sentence 3025: o
['o']

Sentence 3026: k
['k']

Sentence 3027: i
['i']

Sentence 3028: n
['n']

Sentence 3029: g
['g']

Sentence 3030:  
[]

Sentence 3031: a
['a']

Sentence 3032: t
['t']

Sentence 3033:  
[]

Sentence 3034: h
['h']

Sentence 3035: o
['o']

Sentence 3036: w
['w']

Sentence 3037:  
[]

Sentence 3038: b
['b']

Sentence 3039: i
['i']

Sentence 3040: g
['g']

Sentence 3041:  
[]

Sentence 3042: t
['t']

Sentence 3043: e
['e']

Sentence 3

['t']

Sentence 4173: h
['h']

Sentence 4174: e
['e']

Sentence 4175:  
[]

Sentence 4176: c
['c']

Sentence 4177: o
['o']

Sentence 4178: m
['m']

Sentence 4179: p
['p']

Sentence 4180: a
['a']

Sentence 4181: n
['n']

Sentence 4182: y
['y']

Sentence 4183:  
[]

Sentence 4184: n
['n']

Sentence 4185: o
['o']

Sentence 4186: w
['w']

Sentence 4187:  
[]

Sentence 4188: w
['w']

Sentence 4189: a
['a']

Sentence 4190: n
['n']

Sentence 4191: t
['t']

Sentence 4192: s
['s']

Sentence 4193:  
[]

Sentence 4194: t
['t']

Sentence 4195: o
['o']

Sentence 4196:  
[]

Sentence 4197: h
['h']

Sentence 4198: o
['o']

Sentence 4199: o
['o']

Sentence 4200: k
['k']

Sentence 4201:  
[]

Sentence 4202: y
['y']

Sentence 4203: o
['o']

Sentence 4204: u
['u']

Sentence 4205: n
['n']

Sentence 4206: g
['g']

Sentence 4207:  
[]

Sentence 4208: k
['k']

Sentence 4209: i
['i']

Sentence 4210: d
['d']

Sentence 4211: s
['s']

Sentence 4212:  
[]

Sentence 4213: o
['o']

Sentence 4214: n
['n']

Sentence 

['e']

Sentence 5252: a
['a']

Sentence 5253: r
['r']

Sentence 5254: .
['.']

Sentence 5255: "
['``']

Sentence 5256:  
[]

Sentence 5257: B
['B']

Sentence 5258: u
['u']

Sentence 5259: t
['t']

Sentence 5260: ,
[',']

Sentence 5261:  
[]

Sentence 5262: h
['h']

Sentence 5263: e
['e']

Sentence 5264:  
[]

Sentence 5265: a
['a']

Sentence 5266: d
['d']

Sentence 5267: d
['d']

Sentence 5268: e
['e']

Sentence 5269: d
['d']

Sentence 5270: ,
[',']

Sentence 5271:  
[]

Sentence 5272: i
['i']

Sentence 5273: t
['t']

Sentence 5274:  
[]

Sentence 5275: w
['w']

Sentence 5276: o
['o']

Sentence 5277: u
['u']

Sentence 5278: l
['l']

Sentence 5279: d
['d']

Sentence 5280:  
[]

Sentence 5281: t
['t']

Sentence 5282: a
['a']

Sentence 5283: k
['k']

Sentence 5284: e
['e']

Sentence 5285:  
[]

Sentence 5286: r
['r']

Sentence 5287: o
['o']

Sentence 5288: o
['o']

Sentence 5289: t
['t']

Sentence 5290: -
['-']

Sentence 5291: a
['a']

Sentence 5292: n
['n']

Sentence 5293: d
['d']

Sente

In [42]:
# obtain all unigrams
news_unigrams = ngrams(text_sentences, 1)
# count occurrences of each unigram
news_unigrams_freq = Counter(news_unigrams)
# review top 5 frequent unigrams in the news
print("Top 5 unigrams:\n{}".format(news_unigrams_freq.most_common(5)))

Top 5 unigrams:
[(('Facebook under fire over secret teen research\nBy Jane Wakefield\nTechnology reporter Published15 September 2021\nGirl taking a selfie\nIMAGE SOURCE,GETTY IMAGES\nImage caption,\nTeenage girls can be very conscious of body image - and Instagram can make them feel worse, the internal studies showed\nFacebook-owned Instagram has been criticised for keeping secret its internal research into the effect social media had on teenager users.',), 1), (('According to the Wall Street Journal, its studies showed teenagers blamed Instagram for increased levels of anxiety and depression.',), 1), (('Campaign groups and MPs have said it is proof the company puts profit first.',), 1), (('Instagram said the research showed its commitment to understanding complex and difficult issues.',), 1), (("The Wall Street Journal's report, not disputed by Facebook, finds: A 2019 presentation slide said: We make body-image issues worse for one in three teenage girls\nAnother slide said teenagers 