Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some interviews not showing up in full text search #73 #17

Closed
labradford opened this issue Mar 9, 2023 · 8 comments
Closed

Some interviews not showing up in full text search #73 #17

labradford opened this issue Mar 9, 2023 · 8 comments
Assignees

Comments

@labradford
Copy link

Interviews ingested after December 2020 do not appear in full text search, just interview information searches. For example, the interviews in this series are not showing up in full text search: 

https://oralhistory.library.ucla.edu/?f%5Bseries_facet%5D%5B%5D=Chemical+Entanglements%3A+Oral+Histories+of+Environmental+Illness

Example: Searching for Alfaro does not return any results in full text search but it appears in interview information search.
https://oralhistory.library.ucla.edu/catalog/21198-zz002kpkk3?counter=1&q=alfaro

@crisr15
Copy link

crisr15 commented Mar 13, 2023

Confirm with TKay if this is still active.

@crisr15 crisr15 self-assigned this Mar 13, 2023
@crisr15
Copy link

crisr15 commented Mar 15, 2023

Confirmed this is still active and needs to be fixed.

@aprilrieger
Copy link

OralHistoryItem.import_single('21198-zz002kpkk3')

@aprilrieger
Copy link

aprilrieger commented Mar 26, 2023

We need:

item.attribute['transcript_json_t'] << {
"transcript_t": transcript
}

Once the IndexPdfJob occurs, or during the job.

In the job

result = SolrService.extract(path: tmp_file.path)
transcript = result['file'].to_s.strip
result = "file"=>"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nInterview with Evelin Alfaro \n \n\nSESSION 1 (8/17/2020) \n \n\nTimed Log \n \n[00:00:00] Introduction and gives her permission to record the interview. Says she is a \n\nmember of United and Active Women (MUA) and the California Domestic \nWorkers Coalition. She is forty years old. \n\n[00:02:16] Born in Quetzaltenango, Guatemala to a family of seven children. Her dad \nis a carpenter and her mom a stay-at-home mom. They also cultivate corn. \nShe lives close to a forest and a river. She arrives in San Francisco in 2009. \n\n[00:06:57] Wants to talk about the importance of domestic work and she also wants to \nlearn about the study because there is contamination and dangers in her \nhome country as well. For example, they use chemicals on their corn in \nGuatemala without protections like gloves or masks. \n\n[00:09:06] When she was a kid, the corn products irritated her hands – and the smell \nalso irritated her nose. She did not speak to her siblings about the effects. \n\n[00:12:36] Her dad had to use toxic products when he painted houses, and they were \neven worse when it was hot out. She speculates perhaps they affected him. \n\n[00:16:12] Learns to take precautions when cleaning houses like not mixing Clorox \nwith Ajax, opening windows, etc. to avoid reactions. She doesn’t go to the \ndoctor because she doesn’t want to miss a day of work and because she \ncannot pay. \n\n[00:23:06] Continues to work with some toxic products, but there are also some clients \nwho accept vinegar. Symptoms include skin irritation and red eyes. She \nbuys her own protective equipment. \n\n[00:28:53] She likes MUA because they share information and make workers feel more \nsecure about the products they’re using. She declares it is a type of power to \nbe able to participate, receive training, and spread awareness. \n\n[00:33:20] A leader in the campaign for SB 1257. She especially likes labor rights. She \nshares her testimony with legislators. She feels proud of her work and her \nefforts informing about and expanding the rights of domestic workers. \n\n[00:38:22] Speaks about the connection between domestic work and domestic violence \nbecause oftentimes workers have to spend time in private homes. \n\n[00:39:48] Talks about the connection between working-class jobs and immigration \nstatus. Without papers, they do not feel the freedom to say look, I am not \ngoing to do this because it puts my life at risk. \n\n[00:41:08] Lack of protections during the pandemic. In general, she emphasizes the \nneed to take care of domestic workers the same as those in any other type of \njob. \n\n\n\n[00:49:38] Final words, logistics, and conclusion. Thank you very much for your time. \n\n \n\n\n\n"
      

After item.index_record

I follow the item and see that the information has not been successfully indexed into solr.

  def index_record
    SolrService.add(self.to_solr)
    #TODO allow for search capturing
    SolrService.commit
  end

@DiemBTran
Copy link

blocked for QA until we hear back from John re http basic auth env variables getting set in the deploy

@DiemBTran DiemBTran added blocked Cannot proceed - may be dependent on other work and removed blocked Cannot proceed - may be dependent on other work labels Mar 28, 2023
@DiemBTran
Copy link

Needs further review:

tested on:

  1. I let the Run Importer and Delete Removed Entries run overnight on staging (screenshot)
  2. I did a full text search for a term that is found on both the Evelin Alfaro interview show page and its PDF transcript (“California Domestic Workers Coalition”), screenshot
  3. That search did not return the interview we’re looking for
    1. screenshot of the search results page for the term
    2. screenshot of the search results page for the term PLUS limiting the search by the series it’s in (“Chemical Entanglements: Oral Histories of Environmental Illness”)

@labradford
Copy link
Author

The IndexPdfTranscriptJob is failing on the test server https://oralhistory-test.library.ucla.edu/delayed_job/failed

@aprilrieger
Copy link

This is live on the site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

5 participants