Add usdoj blog post #9

stephbuon · 2023-04-01T22:35:19Z

No description provided.

antagomir

Looks good and the length is also suitable!

I suggest that @pitkant will also approve before release.

pitkant

Looking good, thanks for submitting this!

pitkant · 2023-04-03T15:56:24Z

It takes a while more until the blog post appears on the website, we're looking into it

stephbuon · 2023-04-03T20:44:03Z

Thank you for your help!!

stephbuon · 2023-04-08T13:40:51Z

Hello, @pitkant -- I saw the blog isn't up yet. Is there something I can do to help? Thanks.

pitkant · 2023-05-03T12:46:33Z

Hi @stephbuon , we have now figured out the problems with the web server rendering with University IT people. Apologies that this took so long!

I tried re-rendering your blogpost from the .Rmd file and encountered the following problems:

Row 32: downloading 100000 press releases caused on several occasions a situation where the download would seemingly stall. When I manually cancelled the process I got the following message:

Warning: Error in curl::curl_fetch_memory: Operation was aborted by an application callback

I don't know what the warning message in question is about, but then I read documentation from usdoj API site:

There is a maximum limit of 50 results per request. If you request a pagesize that is larger than 50, you will receive a response with no more than 50 results. Developers leveraging this API should keep the stability of the API and their own applications in mind. Individual users issuing more than 10 requests per second will experience degraded performance and may be blocked entirely.

For a smaller number of downloaded speeches (like 10000, 20000 etc) things worked fine so I assumed my download was getting throttled. But funnily enough when I set my location to US East Coast with a VPN I was able to download the 100K records, albeit very slowly. Maybe Usdoj discriminates against API calls from abroad? Anyway, all this may be more related to usdoj package than than the context of this blog post, let's move on...

Row 50: I wasn't able to replicate the colourful plot_usmap graph even when I was able to download the 100K records. For some reason my earliest_date was Jan 5, 2009 and latest_date was Jan 19, 2009 so maybe the observed differences were too small to differentiate. Here is the map:

Row 99-102: I wasn't able to replicate the described phenomenon of multiple values in a single field: "A single field may contain multiple values. For example, the field "name" contains the (sometimes multiple) US DOJ divisions related to a press release, as shown by lines 7 and 9. A single press release may relate to USAOs across multiple states or may implicate multiple offices." I got the following output:

head(press_releases$name, 10)

 [1] "Office of the Attorney General"            
 [2] "Civil Rights Division"                     
 [3] "Civil Division"                            
 [4] "Criminal Division"                         
 [5] "Environment and Natural Resources Division"
 [6] "Office of the Deputy Attorney General"     
 [7] "Environment and Natural Resources Division"
 [8] "Tax Division"                              
 [9] "Criminal Division"                         
[10] "Tax Division"

Rows 113-118: I'm not sure if the code here is correct. For example

state_names <- paste(statepop$full, collapse = "|USAO - ")

returns the following:

[1] "Alabama|USAO - Alaska|USAO - Arizona|USAO - Arkansas|USAO - California|USAO - Colorado|USAO - Connecticut|USAO - Delaware|USAO - District of Columbia|USAO - Florida|USAO - Georgia|USAO - Hawaii|USAO - Idaho|USAO - Illinois|USAO - Indiana|USAO - Iowa|USAO - Kansas|USAO - Kentucky|USAO - Louisiana|USAO - Maine|USAO - Maryland|USAO - Massachusetts|USAO - Michigan|USAO - Minnesota|USAO - Mississippi|USAO - Missouri|USAO - Montana|USAO - Nebraska|USAO - Nevada|USAO - New Hampshire|USAO - New Jersey|USAO - New Mexico|USAO - New York|USAO - North Carolina|USAO - North Dakota|USAO - Ohio|USAO - Oklahoma|USAO - Oregon|USAO - Pennsylvania|USAO - Rhode Island|USAO - South Carolina|USAO - South Dakota|USAO - Tennessee|USAO - Texas|USAO - Utah|USAO - Vermont|USAO - Virginia|USAO - Washington|USAO - West Virginia|USAO - Wisconsin|USAO - Wyoming"

So a single item. I think the intention was to return all the states with USAO attached at the end? Because of this the str_extract function fails on row 115. And probably because of this the code on rows 124-126 returns an empty tibble which in turn causes the final visualization on rows 171-179 to fail and return the following error:

Error in `combine_vars()`:
! Faceting variables must have at least one value
Backtrace:
 1. base (local) `<fn>`(x)
 2. ggplot2:::print.ggplot(x)
 4. ggplot2:::ggplot_build.ggplot(x)
 5. layout$setup(data, plot$data, plot$plot_env)
 6. ggplot2 (local) setup(..., self = self)
 7. self$facet$compute_layout(data, self$facet_params)
 8. ggplot2 (local) compute_layout(..., self = self)
 9. ggplot2::combine_vars(data, params$plot_env, vars, drop = params$drop)

If you could re-render the html file and re-upload it to this repository, we could see if things work correctly now. Thanks and sorry for the inconvenience!

stephbuon · 2023-05-08T16:46:32Z

Thanks for this @pitkant ! It was meant to read from a csv file, not pull from the API in real time. Let me fix that and send it back to you.

stephbuon · 2023-05-08T17:47:56Z

Hi, @pitkant -- The code that pulls 100,000 press releases should have been only for my use (not to be viewed in the blog post). I uploaded a markdown file with echo=F in the map building section so that I would just display the visual (which reads from a CSV file). Does this process not work on your server? Should I do something else?


{r, echo=F, message=FALSE}
library(usmap)
library(lubridate)
library(tidyverse)
library(usdoj)

# press_releases <- doj_press_releases(n_results = 100000, search_direction = "DESC")
# write_csv(press_releases, "press_releases_doj_intro.csv")
press_releases <- read_csv("press_releases_doj_intro.csv")

pitkant · 2023-05-09T08:50:59Z

Hi @stephbuon -- In the case of .Rmd-files, the server does not do any rendering but it just displays already-rendered html files. In the case of plain .md files it does some basic parsing to display the page but that does not allow for some more complicated blog posts, such as ones that have embedded interactive visualisations and so on.

I didn't mean that you should do anything different. I was just trying to replicate the html file on my own computer to see if there is something different in the output compared to the one there is now. I don't know if this is significant but it seems that some older .html files have the yaml front matter left in them whereas yours doesn't. There are also some other minute differences.

For example, compare these two:
usdoj-cran-release/index.en.html
minatutkin-twiitit/index.fi.html

It's clearly knitting related issue but it's hard to say without being able to re-render the html file. Maybe you could try using this formatting in the front matter?

output:
  blogdown::html_page:
    highlight: tango

instead of this

output: blogdown::html_page

pitkant · 2023-05-11T11:18:19Z

I made an attempt at converting the blog post from .Rmd to an .md file and it worked, the blog post is live here: https://ropengov.org/2023/04/usdoj-cran-release/

Maybe we should still attempt to fix the .Rmd file somehow, I can't come up with any other reason than the problem being in the yaml front matter.

pitkant · 2023-05-11T12:13:14Z

I made some further changes to the website, converting some older blog posts to use standard code fences:

```r
some r code example
```

instead of this

```{% highlight r %}
some r code example
```

and updated config.toml to include our preferred syntax highlighting style, tango, instead of the Hugo default monokai. (There are some good looking alternatives as well, maybe we can consider those at some point as well https://xyproto.github.io/splash/docs/all.html)

The blog post should look very similar to what it would look if it was rendered from an .Rmd file now. @stephbuon can decide if it's good enough for now.

stephbuon · 2023-05-18T19:48:40Z

Thank you, @pitkant -- I really appreciate your help with this.

Is it possible to remove the code before the map of the United States and just show the map of the United States?

In the future I will use the method that you did to create this (instead of the .Rmd file)

pitkant · 2023-05-19T07:36:00Z

It's just my personal preference, but I think it's nice that there are some code examples along with the visualisations - even if the code examples wouldn't be fully reproducible. Maybe instead of removing it all the code chunk could be slightly modified to make it clear that it's for illustrative purposes and reproducing the example as it is presented would not work due to API limitations etc...?

Something to this effect:

Original:

# press_releases <- doj_press_releases(n_results = 100000, search_direction = "DESC")
# write_csv(press_releases, "press_releases_doj_intro.csv")
press_releases <- read_csv("press_releases_doj_intro.csv")

Modified:

# A NON-REPRODUCIBLE example of downloading a large number of press releases and saving them
# press_releases <- doj_press_releases(n_results = 100000, search_direction = "DESC")
# write_csv(press_releases, "press_releases_doj_intro.csv")
# press_releases <- read_csv("press_releases_doj_intro.csv")

But if you wish that all the lines 21-57(

homepage/content/post/2023-04-01-usdoj-cran-release/index.en.md

Lines 21 to 57 in 29015f5

    
           ```r 
        
           library(usmap) 
        
           library(lubridate) 
        
           library(tidyverse) 
        
           library(usdoj) 
        
           # press_releases <- doj_press_releases(n_results = 100000, search_direction = "DESC") 
        
           # write_csv(press_releases, "press_releases_doj_intro.csv") 
        
           press_releases <- read_csv("press_releases_doj_intro.csv") 
        
           state <- statepop$full 
        
           count <- list() 
        
           for(state_name in state) { 
        
             count <- append(count, sum(str_count(press_releases$name, state_name))) } 
        
           df <- data.frame(state = unlist(state), count = unlist(count)) 
        
           earliest_date <- ymd(min(press_releases$date)) 
        
           earliest_date <- paste0(month(earliest_date, label = TRUE), " ", day(earliest_date), ", ", year(earliest_date)) 
        
           latest_date <- ymd(max(press_releases$date)) 
        
           latest_date <- paste0(month(latest_date, label = TRUE), " ", day(latest_date), ", ", year(latest_date)) 
        
           plot_usmap(data = df,  
        
                      values = "count",  
        
                      color = "#4682b4") +  
        
             scale_fill_continuous(low = "white",  
        
                                   high = "#4682b4",  
        
                                   name = "n",  
        
                                   label = scales::comma) +  
        
             theme(legend.position = "right") + 
        
             labs(title = "US DOJ Press Releases Involving the FBI Corresponding to State",  
        
                  subtitle = paste0("Raw Count From ", earliest_date, " to ", latest_date),  
        
                  caption = "This plot was generated using data from usdoj. It visualizes the raw count of press releases that are tagged  
        
                  as involving both the FBI and a state's office of the United States Attorney.")  
        
           ```

) be removed, I can of course do that for you. Maybe then some added clarification would be needed, even though the sentence "Data is cleaned and structured before it is returned as a data frame with fields for the body text, date, title, url, the name of the corresponding division, to name just a few" explains the process pretty well?

stephbuon · 2023-05-19T17:23:26Z

Hi, @pitkant ! If you think having non-reproducible code is useful, I take your word for it and am happy to keep it! Good idea adding the disclaimer on top.

I'm happy to publish the blog post (including the disclaimer and code before the first visualization) if you also think it looks good enough.

stephbuon added 2 commits April 1, 2023 17:34

Add files via upload

8cb34d8

Add files via upload

e893f0e

antagomir approved these changes Apr 2, 2023

View reviewed changes

antagomir requested a review from pitkant April 2, 2023 10:00

pitkant approved these changes Apr 3, 2023

View reviewed changes

pitkant merged commit d8b1bb6 into rOpenGov:master Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add usdoj blog post #9

Add usdoj blog post #9

stephbuon commented Apr 1, 2023

antagomir left a comment

pitkant left a comment

pitkant commented Apr 3, 2023

stephbuon commented Apr 3, 2023

stephbuon commented Apr 8, 2023

pitkant commented May 3, 2023

stephbuon commented May 8, 2023 •

edited

Loading

stephbuon commented May 8, 2023 •

edited

Loading

pitkant commented May 9, 2023

pitkant commented May 11, 2023

pitkant commented May 11, 2023

stephbuon commented May 18, 2023 •

edited

Loading

pitkant commented May 19, 2023

stephbuon commented May 19, 2023

Add usdoj blog post #9

Add usdoj blog post #9

Conversation

stephbuon commented Apr 1, 2023

antagomir left a comment

Choose a reason for hiding this comment

pitkant left a comment

Choose a reason for hiding this comment

pitkant commented Apr 3, 2023

stephbuon commented Apr 3, 2023

stephbuon commented Apr 8, 2023

pitkant commented May 3, 2023

stephbuon commented May 8, 2023 • edited Loading

stephbuon commented May 8, 2023 • edited Loading

pitkant commented May 9, 2023

pitkant commented May 11, 2023

pitkant commented May 11, 2023

stephbuon commented May 18, 2023 • edited Loading

pitkant commented May 19, 2023

stephbuon commented May 19, 2023

stephbuon commented May 8, 2023 •

edited

Loading

stephbuon commented May 8, 2023 •

edited

Loading

stephbuon commented May 18, 2023 •

edited

Loading