In [10]:
import os
import spacy
from spacy import displacy
import pandas as pd
import numpy as np

In [8]:
nlp = spacy.load('en')

In [12]:
texts = np.load("corpus_arr.npy")
texts = [t.replace('\n', ' ') for t in texts]

In [13]:
PIPELINE = ['tagger', 'parser', 'ner']
for name in PIPELINE:
    component = nlp.create_pipe(name)
nlp.pipeline
docs = [nlp(txt) for txt in texts[:50]]

In [17]:
docs = [doc for doc in nlp.pipe(texts[:50], n_threads=6)]

In [24]:
docs[10]

Kitzbuhel, Austria (CNN) "The Terminator" Arnold Schwarzenegger says US President Donald Trump is making a "big mistake" on environmental policy.  Making his annual pilgrimage to the famous ski races in Kitzbuhel, the Austrian muscleman, movie star and former governor of California was in the grandstand to watch the second leg of Saturday's slalom race.  Schwarzenegger, who launched the R20 climate change organization in 2011, said seven million people die every year because of global pollution, a reference to a World Health Organization report released in December.  "It is extremely important that in order to be successful with our environmental crusade and to fight global climate change and to fight all of the pollution we have worldwide, we all have to work together," Schwarzenegger told Christina Macfarlane for CNN's Alpine Edge at the Rasmushof Alm hotel in the upmarket resort.  "And the more people we bring into the crusade the better it is. The world leaders alone have not been 

In [19]:
df_pos = pd.DataFrame(columns=['Text', 'Lemma', 'POS', 'TAG', 'DEP', 'Shape', 'Stop Word'])

for token in docs[10]:
    df_pos = df_pos.append({'Text': token.text, 'Lemma': token.lemma_, 'POS': token.pos_, 'TAG': token.tag_, 
                            'DEP': token.dep_, 'Shape': token.shape_, 'Stop Word': token.is_stop}, ignore_index=True)
    
df_pos.head(1000000)

Unnamed: 0,Text,Lemma,POS,TAG,DEP,Shape,Stop Word
0,Kitzbuhel,kitzbuhel,PROPN,NNP,ROOT,Xxxxx,False
1,",",",",PUNCT,",",punct,",",False
2,Austria,austria,PROPN,NNP,appos,Xxxxx,False
3,(,(,PUNCT,-LRB-,punct,(,False
4,CNN,cnn,PROPN,NNP,appos,XXX,False
5,),),PUNCT,-RRB-,punct,),False
6,"""","""",PUNCT,``,punct,"""",False
7,The,the,DET,DT,det,Xxx,False
8,Terminator,terminator,PROPN,NNP,nsubj,Xxxxx,False
9,"""","""",PUNCT,``,punct,"""",False


In [20]:
for doc in docs:
    displacy.render(doc, style='ent', jupyter=True)
    print('=============================================================================================================')





































































































In [21]:
df_ent = pd.DataFrame(columns=['Text', 'Start Char', 'End Char', 'Label'])

for ent in docs[10].ents:
    df_ent = df_ent.append({'Text': ent.text, 'Start Char': ent.start_char, 'End Char': ent.end_char, 'Label': ent.label_},
                           ignore_index=True)
    
df_ent.head(10000000)

Unnamed: 0,Text,Start Char,End Char,Label
0,Kitzbuhel,0,9,GPE
1,Austria,11,18,GPE
2,CNN,20,23,ORG
3,"The Terminator"" Arnold Schwarzenegger",26,63,WORK_OF_ART
4,US,69,71,GPE
5,Donald Trump,82,94,PERSON
6,annual,158,164,DATE
7,Kitzbuhel,203,212,GPE
8,Austrian,218,226,NORP
9,California,272,282,GPE


In [22]:
for sent in docs[10].sents:
    print(sent)
    tmp_doc = sent.as_doc()
    tmp_doc.user_data['title'] = sent.text
    displacy.render(tmp_doc, style='dep', jupyter=True, options={'compact': True})

Kitzbuhel, Austria (CNN) "


The Terminator" Arnold Schwarzenegger says US President Donald Trump is making a "big mistake" on environmental policy.  


Making his annual pilgrimage to the famous ski races in Kitzbuhel, the Austrian muscleman, movie star and former governor of California was in the grandstand to watch the second leg of Saturday's slalom race.  


Schwarzenegger, who launched the R20 climate change organization in 2011, said seven million people die every year because of global pollution, a reference to a World Health Organization report released in December.  


"It is extremely important that in order to be successful with our environmental crusade and to fight global climate change and to fight all of the pollution we have worldwide, we all have to work together," Schwarzenegger told Christina Macfarlane for CNN's Alpine Edge at the Rasmushof Alm hotel in the upmarket resort.  


"And the more people we bring into the crusade the better it is.


The world leaders alone have not been able to solve the problem and they won't.  


"Whether it is the Indian movement or the women's suffrage movement, or anti apartheid movement or the civil rights movement in America -- none of those were solved in the capitals, always by people.


So people power is essential."  


Late last year Trump, who previously has called climate change a "hoax," rejected his own administration's report that climate change could be devastating for the economy, saying "I don't believe it."  


'


Big mistake'  Schwarzenegger said Trump would one day regret his June 2017 decision to pull the United States out of the Paris Agreement on climate change by 2020.  


The Paris Agreement committed almost 200 countries to keep global warming well below two degrees Celsius above pre-industrial levels and, if possible, below 1.5 degrees.  


The US ratified the Agreement, but Trump in his first year as President said that the US would withdraw from it, although it cannot formally leave until November 2020.  


Arnold Schwarzenegger soaks up the atmosphere in Kitzbuhel.  


"The United States is in, it's just the federal government is not," added the 71-year-old Schwarzenegger.  


"Every city in America is on board to cut back on pollution.


There's tremendous innovation in technology and design.


Everyone is working together, except Trump.  


"Eventually one day he will wake up and he will realize he made a big mistake to sell out to the oil companies and coal companies rather than to sell out to the American people.  


"The message is, he has his beliefs, I have mine.


Even though he is wrong on this issue I still wish him the best of luck because if our President is successful and we as a country are successful, the world is successful."  


Schwarzenegger grew up in Styria and is a long-time ski enthusiast.  


"It's not just global climate change, it's about the pollution that it creates that kills so many people worldwide," he added.  


"Seven million people die every year —


this is inexcusable.


We have to fight, we have to do something about it, and the only way we can win is by doing it together."  


Photos: Kitzbuhel, Austria ski resort guide


Tyrolean treasure: Kitzbuhel is the home of the infamous Hahnenkamm World Cup ski race every January, but the charming Austrian town offers much more than just a death-defying downhill.


Hide Caption 1 of 16 Photos: Kitzbuhel, Austria ski resort guide Sparkling gem: Kitzbuhel is a former silver mining town and a medieval jewel in the heart of Austria's Tirol, 60 miles east of Innsbruck.


Hide Caption 2 of 16 Photos: Kitzbuhel, Austria ski resort guide Hahnenkamm hysteria: The annual World Cup race on the Streif run is the scariest and hairiest on the circuit with thrills and plenty of spills to entertain the huge crowds that flood in.


Hide Caption 3 of 16 Photos: Kitzbuhel, Austria ski resort guide Blue riband: The Hahnenkamm downhill is the highlight of the World Cup circuit and race weekend creates a carnival atmosphere in Kitzbuhel.


Hide Caption 4 of 16 Photos: Kitzbuhel, Austria ski resort guide


He's back:


Celebrities and the jet-set turn out in force to see and be seen.


Austrian native and Terminator star Arnold Schwarzenegger is a regular fixture at the Hahnenkamm finish.


Hide Caption 5 of 16 Photos: Kitzbuhel, Austria ski resort guide Nerves of steel: The Hahnenkamm race requires guts and a no-fear approach to tackle the Streif's huge jumps, and steep, icy terrain as it plunges back towards the town.


Hide Caption 6 of 16 Photos: Kitzbuhel, Austria ski resort guide


Fever pitch: Just making it to the bottom is a feat in itself.


Plenty of racers' seasons have been ended in spectacular crashes on the treacherous descent.


Hide Caption 7 of 16 Photos: Kitzbuhel, Austria ski resort guide No guts, no glory: Winning at Kitzbuhel is the ultimate for a downhill racer.


Austrian great Franz Klammer did it four times but Swiss Didier Cuche (pictured) holds the record with five wins.


Hide Caption 8 of 16 Photos: Kitzbuhel, Austria ski resort guide Chocolate-box charm:


Away from the madness of race weekend, Kitzbuhel is one of the most beautiful settings in the Alps with a pretty, cobbled medieval center.


Hide Caption 9 of 16 Photos: Kitzbuhel, Austria ski resort guide


After dark: This being Austria, the bar scene is buzzing with plenty of watering holes to refresh thirsty skiers and snowboarders after a long day on the mountain.


The Londoner pub is an Alpine institution.


Hide Caption 10 of 16 Photos: Kitzbuhel, Austria ski resort guide Street party: Kitzbuhel's pedestrianized center


is perfect for ambling and taking in the upmarket boutiques, cafes and restaurants.


Hide Caption 11 of 16 Photos: Kitzbuhel, Austria ski resort guide White playground: Kitzbuhel's skiing area is linked with that of Kirchberg.


Together they offer 54 lifts and about 180 kilometers of skiing with endless backcountry opportunity.


Hide Caption 12 of 16 Photos: Kitzbuhel, Austria ski resort guide Ski safari: As well as the runs on the Hahnenkamm mountain, the ski region includes slopes on the Kitzbuheler Horn, as well as the interlinked areas of Jochberg, Resterhohe and Pass Thurn.


Hide Caption 13 of 16 Photos: Kitzbuhel, Austria ski resort guide Cruising grounds: Much of the skiing is tree lined and suits mileage hungry intermediates, although beginners and experts are well catered for.


Hide Caption 14 of 16 Photos: Kitzbuhel, Austria ski resort guide Austrian hospitality: Kitzbuhel is home to myriad four and five-star hotels with a wealth of upmarket eateries and plenty of cosy huts for on-mountain refreshments.


Hide Caption 15 of 16 Photos: Kitzbuhel, Austria ski resort guide Multi talented: Kitzbuhel is more than just a ski resort, with activities such as polo on ice as well as a thriving summer scene including hiking, golf and tennis.


Hide Caption 16 of 16  Schwarzenegger said one of the major problems in the fight is that climate change and global pollution have "no borders.  


"It doesn't matter if you come from a naturally clean country like Austria or Brazil, we all have to work together to fight this problem of this unbelievable pollution that eventually going to kill everyone," he added.  


"The world leaders need to take it seriously and put a time clock on it and say, 'OK, within the next five years we want to accomplish a certain kind of a goal,' rather than push it off until 2035."  


Austrian skiing great Franz Klammer, the 1976 Olympic downhill champion and four-time Kitzbuhel winner, is an ambassador for Schwarzenegger's program.  


"We really have to take care of our planet for the future of our children," he told CNN.


"If we're not taking action now it will be soon too late.  


"That's why it's very important


Arnold is out speaking about this subject a lot, just to make us all recognize it is almost 12 o'clock.


We have to take action."  


This is not the first time Schwarzenegger has taken stabs at Trump.


Last year, the movie-star-turned-politician spoke out against Trump's "zero-tolerance" policy for illegal border crossings, which led to children being separated from their parents.
