# Look at uses of a target word over time

In [1]:
from __future__ import print_function
import time
import numpy as np
import pandas as pd
import pyarrow
import fastparquet
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
import csv
import textwrap
from scipy.spatial.distance import cosine
import spacy
from collections import defaultdict 
from tqdm import tqdm

pd.set_option('display.max_colwidth', 500)

2024-03-20 16:14:18.539284: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Load in the file

In [2]:
target = "human"

In [3]:
tokens = pd.read_csv('./data/logic_words/{}.csv'.format(target))

In [4]:
len(tokens)

146424

In [5]:
df = parquet_file = "/Volumes/data_gabriella_chronis/corpora/acl-publication-info.74k.parquet"

df = pd.read_parquet(parquet_file, engine='pyarrow')

Left hand join the large file to the token file. or do a constant lookup??. maybe just get the year columns

In [6]:
data = tokens.join(df.set_index("corpus_paper_id"), on="corpus_id")

Add a decade column 

In [7]:
data["year"] = data["year"].astype(int)

In [8]:
data["decade"] = ( data['year'] //10)*10

### Look at 10 example sentences from each decade

In [9]:
#df.style.set_properties(subset=['sentence'], **{'width': '300px'})
pd.set_option('display.max_rows', 1000)


data.groupby('decade').sample(5) [['decade', 'sentence' ]]

Unnamed: 0,decade,sentence
105176,1950,"And this, of course, is the responsibility of the human being who sets up the grammatical rules and the dictionary."
132220,1950,"The human translator operates with the tremendous advantage of something called ""context""."
105010,1950,"The machine must be capable of resolving idiomatic, contextual, and syntactic ambiguities if human editing is to be kept at a minimum and maximum intelligibility is to be achieved."
105184,1950,"The solution of, say, a system of certain differential equations was regarded until recently as a performance of which only a highly gifted and thoroughly trained human brain is capable."
103800,1950,One can only feel torn between admiration for the flexibility of the human mind and despair at the thought of trying to reproduce such flexibility in any mechanical device.
105237,1960,"Though machines could doubtless provide a great variety of aids to human translation, so far in no case has economic feasibility of any such aid been proven, though the outlook for the future is not all dark."
141269,1960,"High quality machine translation apparently demands a fair portion of the total language-manipulating capability of the human, but essay grading may use only a fraction of it, and may process language in ways quite different from that of the human being."
106201,1960,The method followed by us is based on the hypothesis that the linguistic performance of human memory consists in a constant segmentation or r4construction of the signs of the linguistic code on levels which are graded and organized each in accordance wlth its own rules~ in function of the specific capacities of the human braln~ and with a ~ertain degree of productiveness.
141249,1960,"The first row refers to the simulation of the human judgment, without great concern about the way this judgment was produced."
54470,1960,"Furthermore, we had limited ourselves in the transformation data to a choice of one syntactic output for each sentence --the output identical with that of the human translation."


In [10]:
save = data.groupby('decade').sample(5) [['decade', 'sentence' ]]
save

Unnamed: 0,decade,sentence
105134,1950,The extremely interesting results achieved by Abraham Kaplan in a study made for the Rand Corporation (1) partially explain the human editor's success.
103981,1950,"The burden on the supply side is too great, the extent of human intervention too large, the essential and most complicated aspect of MT, that of multiple grammatical and non-grammatical meaning, remains unmechanized."
105179,1950,Do human translators work equally in both directions?
103800,1950,One can only feel torn between admiration for the flexibility of the human mind and despair at the thought of trying to reproduce such flexibility in any mechanical device.
105136,1950,"If the machine can produce its part in a time span comparable with that of the conventional human translator, the machine post-editor partnership may well be able to compete in time and accuracy with an all-human translator."
105078,1960,98026) One of the things which most helps a human translator in understanding a text is the whole representational world which words continually evoke.
105285,1960,"If a human being had to look up two thousand different words in a large dictionary, he might write each word on one card, sort the cards into alphabetical order, and then work through the pile of cards during a single reading of the dictionary from A to Z. For a computer, the sorting of the words into alphabetical order is a simple procedure, but the time required for the sort increases faster than the size of the group of words to be sorted; thus there is some optimum number of words to loo..."
125076,1960,"We have found that the operations by means of which human beings construct the nominata of their words are always the same, but we have also found that it is rare for two words of different natural languages to have the same nominatum."
54470,1960,"Furthermore, we had limited ourselves in the transformation data to a choice of one syntactic output for each sentence --the output identical with that of the human translation."
105175,1960,"However, this t~a semantic deep-structure for the hi-lingual dictlonary-entry of ~deliberations"" of the following form: ~r Chlne account for the translations~ which good human anslators actually produce~using the kind of modern which has been reported o~ this paper, the problem is that of finding the ~ structures of the dlctionary-entries from the data actually given by a bilingual corpus; for the construction of the squareforming templates must depend on these-that is if the template-glossa..."


human | ˈ(h)yo͞omən |
adjective

relating to or characteristic of people or human beings: the human body.
1.  of or characteristic of people as opposed to God or animals or machines, especially in being susceptible to weaknesses: they are only human and therefore mistakes do occur | the risk of human error.
2.  of or characteristic of people's better qualities, such as kindness or sensitivity: the human side of politics is getting stronger.
3.   Zoology of or belonging to the genus Homo.

### ACL Human

|decade | notes |
|---------|-------------------|
|1950 | human editor, burden of human intervention, human translator |
|1960 | human translator, if a human had to perform a word lookup, humans construct the nominata of their words |
|1970 | wht human beings do, a human's mental machinery, human languages |
|1980 | time spent on human interaction, for human entities (metalinguistic), human-human communication |
|1990 | human levels of performance, level expected of a human lg-learner, human-machine interfact, human acquisition techniques (contrasted with statistical), instructions to human summarizers)
|2000 | human annotation agreement
|2010 | human pairwise comparison, human post-edit, dictates that human players act according to (annotation instructions) |
|2020 | human judgments, human annotators |

human in contrast with computers. Specifically, human as a source of truth in meaning  in relation to computers as opposed to fallible in relation to god or his own nature (see below)

movement away from emphasis on the person themselves and their activity (human judge; simulating human translators) to emphasis on the data produced by people (human judgment). "I'm not going to do it, but if I did, I'd do it this way. "

## COCA HUMAN

|senses | snippets|
|---|--|
|humility | tragic human outcome, feeble human minds, never was a more noble and yet more human  painter, human nature |
|contrastive | human purposes (as opposed to god's) all around them (gorillas) human activity is increasing|
|of or pertaining to | human languages, human prehistory, human limbs |

Would we call these senses polysemous? 

(3)   Their ancestral home is much smaller than it once was. All around them, human activity is increasing. Yet, the mountain gorillas continue to live and move in (original)
    
    (a) all around them, anthropoid activity is increasing. (synonym for sense 1)
    (b) all around them, imperfect activity is increasing (synonym for sense 2, weakness subsense)
    (b) all around them, fleshy activity is increasing (synonym for sense 2, spirit contrast subsense)
    (3) ? all around them, humane activity is increasing (synonym for sense 1)

sense 1 and 2 in this case are distinct, with sense 2 being acceptable and sense 1 not. 

All of the other sense of human. the comparator sense of human. they have a value in them. do they not? what is the value in the machine-comparator sense of human?