## Code Plan

1. scrapy speechs and filter sentence includes "China"
2. use model to judge the subjective/neutral of sentences
3. use model to judge the positive/negative of sentences
4. range the average scores through time-serie
5. event influence

In [47]:
#pip install transformers
#pip install torch torchvision torchaudio

In [2]:
import torch

In [3]:
from transformers import pipeline

In [4]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [5]:
import json
import os

In [6]:
tokenizer = AutoTokenizer.from_pretrained("cffl/bert-base-styleclassification-subjective-neutral")



In [7]:
def process_sentences(json_file_path, max_length=512):
    processed_sentences = []
    
    # read json file
    with open(json_file_path, 'r') as file:
        sentences_data = json.load(file)
    
    # iterate each sentence
    for entry in sentences_data:
        sentence = entry['sentence']
        
        # tokenize and truncate
        tokens = tokenizer(sentence, truncation=True, max_length=512, return_tensors="pt", padding=True)
        
        # convert to text
        truncated_sentence = tokenizer.decode(tokens['input_ids'][0], skip_special_tokens=True)
        
        # save processed sentence
        processed_sentences.append(truncated_sentence)
    
    return processed_sentences

In [8]:
classify = pipeline(
    task="text-classification",
    model="cffl/bert-base-styleclassification-subjective-neutral",
    return_all_scores=True,
)



In [60]:
classify(input_text3)

[[{'label': 'SUBJECTIVE', 'score': 0.9008364677429199},
  {'label': 'NEUTRAL', 'score': 0.09916352480649948}]]

In [9]:
with open('./speech_json/37_nixon_speech.json', 'r') as file:
    nixon_speech = json.load(file)

with open('./speech_json/38_ford_speech.json', 'r') as file:
    ford_speech = json.load(file)

with open('./speech_json/39_carter_speech.json', 'r') as file:
    carter_speech = json.load(file)

with open('./speech_json/40_reagan_speech.json', 'r') as file:
    reagan_speech = json.load(file)

with open('./speech_json/41_herbertbush_speech.json', 'r') as file:
    herbertbush_speech = json.load(file)

with open('./speech_json/42_clinton_speech.json', 'r') as file:
    clinton_speech = json.load(file)

with open('./speech_json/43_walkerbush_speech.json', 'r') as file:
    walkerbush_speech = json.load(file)

with open('./speech_json/44_obama_speech.json', 'r') as file:
    obama_speech = json.load(file)

with open('./speech_json/45_trump_speech.json', 'r') as file:
    trump_speech = json.load(file)

with open('./speech_json/46_biden_speech.json', 'r') as file:
    biden_speech = json.load(file)

with open('./speech_json/47_vicepresident_biden_speech.json', 'r') as file:
    vicepresident_biden_speech = json.load(file)



In [10]:
def calculate_scores(speech_file, output_file):
    results = []

    for entry in speech_file:
        sentence = entry['sentence']
        date = entry['date']
        tokens = tokenizer(sentence, truncation=True, max_length=512, return_tensors="pt", padding=True)
        truncated_sentence = tokenizer.decode(tokens['input_ids'][0], skip_special_tokens=True)
        result = classify(truncated_sentence)
        subjective_score, neutral_score = result[0][0]['score'], result[0][1]['score']

        # Collect sentences with their scores and dates
        results.append({
            "date": date,
            "subjective_score": subjective_score,
            "neutral_score": neutral_score,
            "sentence": sentence
        })

    # Save results to a JSON file
    with open(output_file, 'w') as outfile:
        json.dump(results, outfile, indent=4)

    return results

In [18]:
# nixon_results = calculate_scores(nixon_speech, 'nixon_scores.json')
# ford_results = calculate_scores(ford_speech, 'ford_scores.json')
# carter_results = calculate_scores(carter_speech, 'carter_scores.json')
# reagan_results = calculate_scores(reagan_speech, 'reagan_scores.json')
# herbertbush_results = calculate_scores(herbertbush_speech, 'herbertbush_scores.json')
# clinton_results = calculate_scores(clinton_speech, 'clinton_scores.json')
# walkerbush_results = calculate_scores(walkerbush_speech, 'walkerbush_scores.json')
# obama_results = calculate_scores(obama_speech, 'obama_scores.json')
# trump_results = calculate_scores(trump_speech, 'trump_scores.json')
# biden_results = calculate_scores(biden_speech, 'biden_scores.json')
# vicepresident_biden_results = calculate_scores(vicepresident_biden_speech, 'vicepresident_biden_scores.json')


In [73]:
def calculate_average_scores(speech_file):
    scores = []
    total_subjective_score = 0
    total_neutral_score = 0
    count = 0
    subjective_sentences = []
    neutral_sentences = []

    for entry in speech_file:
        sentence = entry['sentence']
        tokens = tokenizer(sentence, truncation=True, max_length=512, return_tensors="pt", padding=True)
        truncated_sentence = tokenizer.decode(tokens['input_ids'][0], skip_special_tokens=True)
        result = classify(truncated_sentence)
        subjective_score, neutral_score = result[0][0]['score'], result[0][1]['score']
        total_subjective_score += subjective_score
        total_neutral_score += neutral_score
        count += 1

        # Collect sentences with their scores
        subjective_sentences.append((sentence, subjective_score))
        neutral_sentences.append((sentence, neutral_score))

    average_subjective_score = total_subjective_score / count
    average_neutral_score = total_neutral_score / count

    print(f"Average Subjective Score: {average_subjective_score}")
    print(f"Average Neutral Score: {average_neutral_score}")
    scores.append(average_subjective_score)
    scores.append(average_neutral_score)

    # Sort and print top 10 subjective and neutral sentences
    subjective_sentences.sort(key=lambda x: x[1], reverse=True)
    neutral_sentences.sort(key=lambda x: x[1], reverse=True)

    print("\nTop 10 Subjective Sentences:")
    for sentence, score in subjective_sentences[:10]:
        print(f"{score:.4f}: {sentence}")

    print("\nTop 10 Neutral Sentences:")
    for sentence, score in neutral_sentences[:10]:
        print(f"{score:.4f}: {sentence}")

    return scores


<h4 style="text-align: left;">总结</h4>
<ul>
<li>有些句子太长了，需要截断，不然会报错</li>
<li>有些句子是存粹的描述，没有主观性，需要过滤掉，过滤的方法可以是客观分数超过0.9的句子</li>
</ul>

In [45]:
nixon_scores = calculate_average_scores(nixon_speech)

Average Subjective Score: 0.6384612598364953
Average Neutral Score: 0.36153873993067415

Top 10 Subjective Sentences:
0.9960: Regrettably, only three holders of such passports were permitted entry to China
0.9945: "The meeting between the leaders of China and the United States is to seek the normalization of relations between the two countries and also to exchange views on questions of concern to the two sides." This announcement could have the most profound significance for future generations
0.9919: And the Prime Minister naturally said one city must be Shanghai, the biggest city in China
0.9900: Some rather naive observers have assumed that because I was going to Mainland China, that the differences between Mainland China and its 800 million people and its Government and that of the United States--that those differences would evaporate
0.9898: That visit does not mean that the differences between the Governments of the People's Republic of China and that of the United States are goi

In [64]:
ford_scores = calculate_average_scores(ford_speech)

Average Subjective Score: 0.6419007938230437
Average Neutral Score: 0.35809920947446205

Top 10 Subjective Sentences:
0.9938: China is a major country and a great country
0.9839: The People's Republic of China announced today the passing away of Chairman Mao Tse-tung
0.9787: Our agricultural abundance helped open the door to 800 million people on the mainland of China
0.9771: Well, in 1972 when we reopened the doors between China and the United States, a Shanghai communiqué was issued which called for the gradual movement of better relations, broader relations, deeper relations, aiming at some point to normalization of relations
0.9768: The process of normalizing relations with the People's Republic of China, in which Ambassador Gates will play a very vital role, is now well underway
0.9736: George Bush was our representative in the People's Republic of China and in that capacity did extremely well
0.9707: This visit gives us all a great farewell boost on our way to the People's Republ

In [74]:
carter_scores = calculate_average_scores(carter_speech)

Average Subjective Score: 0.6137709823428451
Average Neutral Score: 0.38622901762308864

Top 10 Subjective Sentences:
0.9970: We had a good and useful trip to China
0.9956: The Congress has acted wisely in passing legislation amending the Foreign Assistance Act to authorize the Overseas Private Investment Corporation (OPIC) to operate in the People's Republic of China
0.9940: And now a multibillion dollar expansion program in their own trade and also a multibillion dollar expansion program in investments, commercial investments in China, could very easily be financed through normal or private business loans because of China's very excellent credit rating
0.9935: The agreement resumes scheduled air service between our country and mainland China after a gap of 31 years
0.9907: There's a fairly good crop of grain in China and some other areas of the world
0.9900: I know there's a great feeling of gratitude in China and the United States for this new, wonderful relationship
0.9878: The nor

In [65]:
reagan_scores = calculate_average_scores(reagan_speech)

Average Subjective Score: 0.6146424750768584
Average Neutral Score: 0.38535752293097725

Top 10 Subjective Sentences:
0.9976: This excellent program brings young people from Canada, China, France, Germany, Italy, Japan, Mexico, and the United Kingdom to EPCOT
0.9963: China is now embarked on an exciting experiment designed to modernize the economy and quadruple the value of its national economic output by the year 2000
0.9928: Both the United States and China have stood together in condemning the evil and unlawful invasion of Afghanistan
0.9912: In FY-1983, nearly 10,000 immigrant visas were issued by our Foreign Service posts in China
0.9911: Reform in China is now widespread and dramatic
0.9910: In 1981 a bright, young American student, John Zeidman, came here to study China and to seek new friends
0.9892: Well, America and China are both great nations
0.9880: Horowitz was named consul general at the American consulate in Sydney, Australia, in 1981, and in 1984 to present, deputy chi

In [66]:
herbertbush_scores = calculate_average_scores(herbertbush_speech)

Average Subjective Score: 0.5798382258281269
Average Neutral Score: 0.42016177417269385

Top 10 Subjective Sentences:
0.9963: Charles Pei Wang, President of the China Institute in America, an outstanding new addition
0.9925: And China does have the proper policy
0.9915: And for this we must credit the reforms China embarked upon 10 years ago under Chairman Deng Xiaoping's farsighted leadership
0.9904: In 1989 U.S.-China trade amounted to $18 billion, and China was our 10th largest trading partner worldwide
0.9794: MFN is based on emigration, and emigration has continued from China at respectable levels
0.9764: Maintaining flexibility in administering our productive student and scholar exchange program with China is important
0.9749: We will keep them under review for future adjustments to respond to further positive developments in China
0.9733: But we've passed the day on the U.S.-China relationship where anyone talks about "playing a card." That was a term that was highly offensive t

In [80]:
clinton_scores = calculate_average_scores(clinton_speech)

Average Subjective Score: 0.6210600175086957
Average Neutral Score: 0.3789399818752506

Top 10 Subjective Sentences:
0.9962: Fifteen other airplanes followed on a daring one-way trip to Tokyo and on to China
0.9953: Under the wise, compassionate leadership of Eleanor Roosevelt, half a century ago 18 delegates from China to Lebanon, Chile to Ukraine forged the first international agreement on the rights of humankind
0.9949: The Administration notes that large portions and important portions of the President's interview with Chinese news organizations were carried and prominently featured in broadcast and reporting in China and that's good
0.9948: One, the United States Trade Representative's Office negotiated this tremendous agreement with China
0.9947: Many courageous proponents of change in China agree
0.9945: The first is the best Secretary of Education this country ever had, Dick Riley, who is in China tonight
0.9939: Third -- and a shameless attempt to get on ESPN's Sports Center o

In [81]:
walkerbush_scores = calculate_average_scores(walkerbush_speech)

Average Subjective Score: 0.6137421405494247
Average Neutral Score: 0.38625786029825787

Top 10 Subjective Sentences:
0.9947: China has undergone an amazing transformation in its economy
0.9939: And so China is a fascinating country that is significant in its size
0.9935: China is a great country
0.9935: China is a great country
0.9932: Australia, fortunately, has got a surplus with China
0.9932: And finally, I want to pay tribute to Sandy Randt, who has done a fabulous job as our Ambassador to China
0.9916: We helped bring China and Taiwan into the World Trade Organization, and that's good
0.9907: And so my job is to say to China, "Open up your markets." My job is to say to Europe, "Open up your markets." And we're making progress
0.9907: That's why China is very much involved in the process now, which is helpful
0.9903: China is a great emerging nation

Top 10 Neutral Sentences:
0.9754: Awards include the Fielding Internship Award and the Luce Scholarship to the People's Republic of 

In [82]:
obama_scores = calculate_average_scores(obama_speech)

Average Subjective Score: 0.634195702759308
Average Neutral Score: 0.3658042977734973

Top 10 Subjective Sentences:
0.9962: The former Governor of Washington State, current Commerce Secretary, soon-to-be Ambassador to China--that's all one person--Gary Locke is here with his beautiful wife
0.9960: And as President Obama rightly said just now, sound China-U.S
0.9959: China of course now is the second-largest economy in the world
0.9953: China appropriately called back a shipment of arms to Zimbabwe, and the U.S
0.9952: Unfortunately, Russia and China vetoed a resolution that would have passed through the Security Council
0.9942: Christie Vilsack
 His Excellency Ding Xuexiang, Permanent Deputy Director, General Office of Central Committee of the Communist Party of China
 Mr
0.9939: The new visa extension that begins today will bring more Chinese tourists to the United States and more American tourists to see the magnificent sights of China
0.9936: It's the spirit that brought a young wom

In [79]:
trump_scores = calculate_average_scores(trump_speech)

Average Subjective Score: 0.693941963181561
Average Neutral Score: 0.3060580367922221

Top 10 Subjective Sentences:
0.9968: That's going to be an easy one.
 Since China joined -- and it's another beauty -- the World Trade Organization in 2001, the United States has lost many more than 60,000 factories
0.9965: As he rightly pointed before leaving China, it was a week that changed the world
0.9953: Biden supported every globalist sellout of Ohio workers for over a half a century, including NAFTA, China's entry into the World Trade Organization, the disaster known as TPP, and the— how about the Paris Climate Accord? [ ] That's another beauty
0.9952: Unfortunately, it does come in from China
0.9952: For the last 30 years, China—and in all fairness, and other countries
0.9950: I said to Sonny Perdue, Secretary of Agriculture, great guy, "How much have we been targeted for our farmers by China?" He said, "Sir, two years ago, $12 billion and last year, 16
0.9947: [ ] And weeks ago, we also si

In [78]:
biden_scores = calculate_average_scores(biden_speech)

Average Subjective Score: 0.6091680643858312
Average Neutral Score: 0.39083193566408403

Top 10 Subjective Sentences:
0.9956: Trump suggested that Mark Milley, who is a hell of a soldier, Chairman of the Joint Chiefs, was—Mark Milley should, quote, "face death" because he contacted China following January 6 to reassure them that the United States was still stable
0.9941: According to my excellent colleagues from Reuters, the administration is, you know, consulting with allies regarding further sanctions against China if China decides to deliver weapons to Russia
0.9914: Because Chinese steel companies produce a lot more steel than China needs, it ends up dumping the extra steel into the global markets at unfairly low prices
0.9912: The President and Prime Minister then discussed the two Canadian citizens—Michael Kovrig and Michael Spavor—who are unjustly detained by the People's Republic of China
0.9911: Because the CCP does not compete fairly, imposing new tariffs is a necessary step 

In [37]:
vicepresident_biden_scores = calculate_average_scores(vicepresident_biden_speech)

Average Subjective Score: 0.6647560222964374
Average Neutral Score: 0.3352439795358121
