Take a random sample of 100 of the sentences, and for each one try to determine whether all the events described are physical, or if they involve any sort of planning, goal-setting, wanting, or anything could be described as a thought. 

For example a sentence like “The animal finds or builds a suitable place to hibernate where they can stay warm and protected.” Involves the rational act of “finding”, so it involves some thought.

While a sentence like “Solar radiation reaches Earth's atmosphere.” Is entirely physical.

In [9]:
import pandas as pd
import io
import requests


In [10]:
data=pd.read_csv("data/flattened_paragraphs.csv", index_col=None)

In [11]:
sample_100 = data.sample(100)
sample_100[:3]

Unnamed: 0,line_id,paragraphID,sentence ID,topic,prompt,sentence
328,43,140,7,lungs,How do the lungs work?,The lungs expel the carbon dioxide through the...
386,53,211,1,rain,How does rain occur?,Water droplets in clouds collide.
2696,395,1066,8,mushroom,Describe the life cycle of a mushroom.,A primordia shoots out of the ground.


In [12]:
sample_100.to_csv('sample_100_if_thoughts.csv')

### Restructure the sample100 data

In [13]:
data100 = pd.read_csv("data/sample_100_if_thoughts.csv")

In [14]:
data100[:10]

Unnamed: 0,line_id,paragraphID,sentence ID,topic,prompt,sentence,is_thought_nin,is_thought_leo
0,13,37,6,fossil,How are fossils formed?,Fossils are formed.,0.0,0
1,18,48,8,volcanic eruption,What happens during a vocanic eruption?,Mudslides and ash clouds cause problems after ...,0.0,0
2,24,58,9,flood,How do floods happen?,The rest of the floodwater evaporates.,0.0,0
3,25,61,1,sedimentary rock,How does sediment turn into sedimentary rock?,Minerals fill spaces between bits of sediment.,0.0,0
4,29,69,6,coal,How does coal form?,Pressure caused by sedimentary rocks squeezes ...,0.0,0
5,31,76,4,sedimentary rock,How does sedimentary rock form?,The layers of sediment are pressed together by...,0.0,0
6,55,214,2,seed dispersal,How do animals help plants disburse seeds?,Animals eat the fruit.,0.0,1
7,56,221,3,cloud,How are clouds formed?,Some of the vapor condenses onto tiny pieces o...,0.0,0
8,62,251,5,tree rings,How do rings form inside the trunk of a tree?,The bark leaves a ring.,0.0,0
9,65,254,5,seed dispersal,How do plants use animals to help disburse the...,The seed is digested by the animal and excrete...,0.0,0


In [15]:
data100 = data100.drop(["Unnamed: 0","index"],axis=1)
data100[:5]

KeyError: "['Unnamed: 0' 'index'] not found in axis"

In [16]:
import numpy as np
data100["is_thought_nin"] = data100["is_thought_nin"].replace(np.nan, "0")

In [17]:
df_merge = pd.merge(data, data100, on=['sentence']).dropna()

In [18]:
df_merge[:5]

Unnamed: 0,line_id_x,paragraphID_x,sentence ID_x,topic_x,prompt_x,sentence,line_id_y,paragraphID_y,sentence ID_y,topic_y,prompt_y,is_thought_nin,is_thought_leo
0,13,37,6,fossil,How are fossils formed?,Fossils are formed.,13,37,6,fossil,How are fossils formed?,0.0,0
1,18,48,8,volcanic eruption,What happens during a vocanic eruption?,Mudslides and ash clouds cause problems after ...,18,48,8,volcanic eruption,What happens during a vocanic eruption?,0.0,0
2,24,58,9,flood,How do floods happen?,The rest of the floodwater evaporates.,24,58,9,flood,How do floods happen?,0.0,0
3,25,61,1,sedimentary rock,How does sediment turn into sedimentary rock?,Minerals fill spaces between bits of sediment.,25,61,1,sedimentary rock,How does sediment turn into sedimentary rock?,0.0,0
4,29,69,6,coal,How does coal form?,Pressure caused by sedimentary rocks squeezes ...,29,69,6,coal,How does coal form?,0.0,0


In [None]:
df_merge.to_csv("data/sample_100_if_thoughts.csv", index=False)

## Filter out pure physical sentences from the size-100 sample

In [19]:
df_merge = df_merge[df_merge.is_thought_leo == 0]
df_merge = df_merge[df_merge.is_thought_nin == 0]

In [None]:
df_merge.to_csv("data/sample100_pure_physical.csv", index=False)

## Extract verbs for these sentences

In [20]:
import pandas as pd
pure_phy_df = pd.read_csv('data/sample100_pure_physical.csv')

In [21]:
pure_phy_df["sentence"]

0                                   Fossils are formed.
1     Mudslides and ash clouds cause problems after ...
2                The rest of the floodwater evaporates.
3        Minerals fill spaces between bits of sediment.
4     Pressure caused by sedimentary rocks squeezes ...
                            ...                        
60         When rain goes into soil it creates an acid.
61    Formation occurs when the star runs out of hyd...
62    Radioisotopes would like to be stable isotopes...
63                  The liver converts ammonia to urea.
64                                  The scab falls off.
Name: sentence, Length: 65, dtype: object

In [25]:
import stanza


ModuleNotFoundError: No module named 'stanza'

### Restructure the sample100 data

In [28]:
data100 = pd.read_csv("data/sample_100_if_thoughts.csv")

In [29]:
data100[:10]

Unnamed: 0.1,Unnamed: 0,sentence,index,is_thought_nin,is_thought_leo
0,938,Living things die.,135,0.0,0
1,2287,The resulting information is interpreted by th...,330,1.0,1
2,564,Balance your body as you roll on the skateboard.,79,1.0,1
3,640,Acid rain enters the atmosphere and lands.,91,0.0,0
4,492,The materials are purchased by manufacturers.,68,1.0,1
5,2628,Only wanted elements are passed to within the ...,386,1.0,0
6,3244,The grinder is activated.,479,0.0,1
7,1300,They take in fluids and liquids.,184,0.0,0
8,2488,Decomposition produces methane.,364,0.0,0
9,1099,Different parts of the car cause the motion in...,156,0.0,0


In [30]:
data100 = data100.drop(["Unnamed: 0","index"],axis=1)
data100[:5]

Unnamed: 0,sentence,is_thought_nin,is_thought_leo
0,Living things die.,0.0,0
1,The resulting information is interpreted by th...,1.0,1
2,Balance your body as you roll on the skateboard.,1.0,1
3,Acid rain enters the atmosphere and lands.,0.0,0
4,The materials are purchased by manufacturers.,1.0,1


In [33]:
import numpy as np
data100["is_thought_nin"] = data100["is_thought_nin"].replace(np.nan, "0")

In [34]:
df_merge = pd.merge(data, data100, on=['sentence']).dropna()

In [35]:
df_merge[:5]

Unnamed: 0,line_id,paragraphID,sentence ID,topic,prompt,sentence,is_thought_nin,is_thought_leo
0,13,37,6,fossil,How are fossils formed?,Fossils are formed.,0,0
1,18,48,8,volcanic eruption,What happens during a vocanic eruption?,Mudslides and ash clouds cause problems after ...,0,0
2,24,58,9,flood,How do floods happen?,The rest of the floodwater evaporates.,0,0
3,25,61,1,sedimentary rock,How does sediment turn into sedimentary rock?,Minerals fill spaces between bits of sediment.,0,0
4,29,69,6,coal,How does coal form?,Pressure caused by sedimentary rocks squeezes ...,0,0


In [39]:
df_merge.to_csv("data/sample_100_if_thoughts.csv", index=False)

## Filter out pure physical sentences from the size-100 sample

In [42]:
df_merge = df_merge[df_merge.is_thought_leo == 0]
df_merge = df_merge[df_merge.is_thought_nin == 0]

In [44]:
df_merge.to_csv("data/sample100_pure_physical.csv", index=False)

## Extract verbs for these sentences

In [5]:
import pandas as pd
pure_phy_df = pd.read_csv('data/sample100_pure_physical.csv')