Skip to content
This repository has been archived by the owner on Sep 9, 2022. It is now read-only.

invabs2abs is inserting "NA" into abstracts #8

Closed
crew102 opened this issue Jan 7, 2019 · 4 comments
Closed

invabs2abs is inserting "NA" into abstracts #8

crew102 opened this issue Jan 7, 2019 · 4 comments
Labels
Milestone

Comments

@crew102
Copy link
Collaborator

crew102 commented Jan 7, 2019

The inverted index that the API serves sometimes has missing values. For example, the index may provide information about the tokens at locations 0, 2, and 3, but not location 1 (e.g., "InvertedIndex":{"i":[0],"big":[2], "dogs":[3]}.)

invabs2abs(), however, assumes that there is a token for each location. This results in NAs getting inserted into the abstract text.

library(microdemic)
library(jsonlite)

res <- ma_abstract("And(Composite(AA.AuN=='jaime teevan'),Y>2012)", key = Sys.getenv("MAG_KEY"))
res$abstract[grepl("\\bNA\\b", res$abstract)]
#> [1] "The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query NA japan NA spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and prediction of such temporal patterns in Web search behavior. We develop a temporal modeling framework adapted from physics and signal processing and harness it to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises. Using current and past behavioral data, we develop a learning procedure that can be used to construct models of users' Web search activities. We also develop a novel methodology that learns to select the best prediction model from a family of predictive models for a given query or a class of queries. Experimental results indicate that the predictive models significantly outperform baseline models that weight historical evidence the same for all queries. We present two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art. Finally, we discuss opportunities for using models of temporal dynamics to enhance other areas of Web search and information retrieval."
#> [2] "The physical constraints of smartwatches limit the range and complexity of tasks that can be completed. Despite interface improvements on smartwatches, the promise of enabling productive work remains largely unrealized. This paper presents NA WearWrite , a system that enables users to write documents from their smartwatches by leveraging a crowd to help translate their ideas into text. WearWrite users dictate tasks, respond to questions, and receive notifications of major edits on their watch. Using a dynamic task queue, the crowd receives tasks issued by the watch user and generic tasks from the system. In a week-long study with seven smartwatch users supported by approximately 29 crowd workers each, we validate that it is possible to manage the crowd writing process from a watch. Watch users captured new ideas as they came to mind and managed a crowd during spare moments while going about their daily routine. WearWrite represents a new approach to getting work done from wearables using the crowd."

Created on 2019-01-07 by the reprex package (v0.2.0.9000).

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.4 (2018-03-15)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-01-07
#> Packages -----------------------------------------------------------------
#>  package    * version date       source         
#>  backports    1.1.2   2017-12-13 cran (@1.1.2)  
#>  base       * 3.4.4   2018-03-16 local          
#>  compiler     3.4.4   2018-03-16 local          
#>  crul         0.7.0   2019-01-04 CRAN (R 3.4.4) 
#>  curl         3.2     2018-03-28 CRAN (R 3.4.4) 
#>  datasets   * 3.4.4   2018-03-16 local          
#>  devtools     1.13.6  2018-06-27 cran (@1.13.6) 
#>  digest       0.6.17  2018-09-12 cran (@0.6.17) 
#>  evaluate     0.10.1  2017-06-24 cran (@0.10.1) 
#>  graphics   * 3.4.4   2018-03-16 local          
#>  grDevices  * 3.4.4   2018-03-16 local          
#>  htmltools    0.3.6   2017-04-28 cran (@0.3.6)  
#>  httpcode     0.2.0   2016-11-14 CRAN (R 3.4.4) 
#>  jsonlite   * 1.5     2017-06-01 cran (@1.5)    
#>  knitr        1.20    2018-02-20 cran (@1.20)   
#>  magrittr     1.5     2014-11-22 cran (@1.5)    
#>  memoise      1.1.0   2017-04-21 CRAN (R 3.4.4) 
#>  methods    * 3.4.4   2018-03-16 local          
#>  microdemic * 0.4.0   2018-10-25 CRAN (R 3.4.4) 
#>  pillar       1.2.3   2018-05-25 cran (@1.2.3)  
#>  R6           2.2.2   2017-06-17 cran (@2.2.2)  
#>  Rcpp         0.12.18 2018-07-23 cran (@0.12.18)
#>  rlang        0.2.2   2018-08-16 cran (@0.2.2)  
#>  rmarkdown    1.10    2018-06-11 cran (@1.10)   
#>  rprojroot    1.3-2   2018-01-03 cran (@1.3-2)  
#>  stats      * 3.4.4   2018-03-16 local          
#>  stringi      1.2.4   2018-07-20 cran (@1.2.4)  
#>  stringr      1.3.1   2018-05-10 cran (@1.3.1)  
#>  tibble       1.4.2   2018-01-22 cran (@1.4.2)  
#>  tools        3.4.4   2018-03-16 local          
#>  triebeard    0.3.0   2016-08-04 CRAN (R 3.4.4) 
#>  urltools     1.7.1   2018-08-03 CRAN (R 3.4.4) 
#>  utils      * 3.4.4   2018-03-16 local          
#>  withr        2.1.2   2018-03-15 CRAN (R 3.4.4) 
#>  yaml         2.2.0   2018-07-25 cran (@2.2.0)

Happy to issue a PR for this.

crew102 added a commit to crew102/microdemic that referenced this issue Jan 7, 2019
@sckott sckott added the bug label Jan 8, 2019
@sckott sckott added this to the v0.5 milestone Jan 8, 2019
@sckott
Copy link
Contributor

sckott commented Jan 8, 2019

thanks for another report @crew102

if you could submit a fix PR that'd be great

@sckott
Copy link
Contributor

sckott commented Jan 8, 2019

do submit separate PR's for the two issues if you could

@crew102
Copy link
Collaborator Author

crew102 commented Jan 8, 2019

Yep no problem, I'll submit separate PRs. I should probably have them both submitted sometime later this week, after I've had a chance to ramp up on vcr.

@sckott
Copy link
Contributor

sckott commented Jan 8, 2019

#9 fixed

@sckott sckott closed this as completed Jan 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants