Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP error 403 #20

Open
rrik opened this issue Jun 16, 2021 · 15 comments
Open

HTTP error 403 #20

rrik opened this issue Jun 16, 2021 · 15 comments

Comments

@rrik
Copy link

rrik commented Jun 16, 2021

Hello,

I am getting a 403 error when attempting the following

`> GetIncome("FB", 2016)
Error in fileFromCache(file) :
Error in download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
cannot open URL 'https://www.sec.gov/Archives/edgar/data/1326801/000132680116000043/fb-20151231.xsd': HTTP status was '403 Forbidden'`

Do the source links need updating? Thank you!

@darh78
Copy link

darh78 commented Jun 20, 2021

Hello,
I'm having a similar issue, but with "404 Not Found":

GetIncome("TSLA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml'

In addition: Warning message:
In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-20191231.xml': HTTP status was '404 Not Found'

@selgamal
Copy link

selgamal commented Jun 20, 2021

@darh78 That file doesn't exist try:
https://www.sec.gov/Archives/edgar/data/1318605/000156459020004475/tsla-10k_20191231_htm.xml

@rrik that happens to me also with older submissions, seems like it has to do with the SEC fair use policy, you can try downloading the file manually and put it in the cache folder, or you can run the code few times, it will eventually end up downloading it.

@PatronMaster
Copy link

Hi,

I also tried same error,

   if (foreign == FALSE) {
        url <- paste0("http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=", 
            symbol, "&type=10-k&dateb=&owner=exclude&count=100")
    }
    filings <- xml2::read_html(url)

I try to change count for 1 and works, so it seems this page is detecting that we are not a browser and block. We need to use rSelenium :(

@uramnama
Copy link

I have been receiving the same error. Is there any workaround?

@smartgamer
Copy link

same error here:

CompanyInfo("GOOG")
Error in open.connection(x, "rb") : HTTP error 403.

@ramirezjaime
Copy link

Same error 403 in all functions

AnnualReports ("TSLA")
Error in open.connection(x, "rb") : HTTP error 403.

@ramirezjaime
Copy link

R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] edgarWebR_1.1.0 finreportr_1.0.2

loaded via a namespace (and not attached):
[1] xml2_1.3.2 magrittr_2.0.1 tidyselect_1.1.1 rvest_1.0.1 R6_2.5.1 rlang_0.4.11
[7] fansi_0.5.0 stringr_1.4.0 httr_1.4.2 dplyr_1.0.7 tools_4.1.0 utf8_1.2.2
[13] DBI_1.1.1 selectr_0.4-2 ellipsis_0.3.2 assertthat_0.2.1 tibble_3.1.4 lifecycle_1.0.0
[19] crayon_1.4.1 purrr_0.3.4 vctrs_0.3.8 curl_4.3.2 glue_1.4.2 stringi_1.7.4
[25] compiler_4.1.0 pillar_1.6.2 generics_0.1.0 pkgconfig_2.0.3

@j-uchiha
Copy link

j-uchiha commented Nov 7, 2021

I am also experiencing this problem.

@vsoler
Copy link

vsoler commented Dec 1, 2021

Here is my workaround to your problem.

The problem is that the SEC wants the scraper to be identified in what it is called user-agent.

Before placing my request for data I execute ...

     options(HTTPUserAgent = "your name here   my_name@domain.com")

The user name is only remembered during the current session.

With this workaround, everything works fine for me, no more errors 403

VS

@eweiss99
Copy link

I used vsoler's suggestion to use the options statement and I'm still having trouble:

GetIncome("MA", 2020)
Error in fileFromCache(file.inst) : 
  Error in download.file(file, cached.file, quiet = !verbose) : 
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml'

In addition: Warning messages:
1: In download.file(file, cached.file, quiet = !verbose) :
  downloaded length 0 != reported length 324
2: In download.file(file, cached.file, quiet = !verbose) :
  cannot open URL 'https://www.sec.gov/Archives/edgar/data/1141391/000114139120000032/ma-20191231.xml': HTTP status was '404 Not Found'

According to the SEC the user-agent must be used in the request header.

@Padiol
Copy link

Padiol commented Mar 27, 2022

Hi guys,

Any chances of having an update solving the pb here?
I am still running into errors despite using the user agent, but only for specific years.

@billytaipei101
Copy link

My work around for this problem was to install two missing packages 'XBRL' and 'Rcpp'

@Alex-Sigma
Copy link

Guys could you please suggest current solution for this problem? (HTTP error 403)
Secondly is this package actively maintained or not?
Thanks in advance!

@riazarbi
Copy link

riazarbi commented Nov 16, 2022

There are several errors being conflated in this issue.

The 403 errors are because your clement is not authorised. This is because you have not set (or have improperly set) your User-Agent header and the SEC is saying you can’t have access.

The 404 error mentioned by @eweiss99 is because the file that finreportr is trying to download does not exist. The finreportr package guesses the correct file name of the submission file by adding the date to the ticker code (ma-20191231.xml). But, for whatever reason, the filer didn’t name their submission file like that. If you got to the actual accession web page, you see that the file is actually called ma12312019-10xk_htm.xml. This is a legit bug in finreportr because it is not correctly determining the file name.

IMO the best fix here would be for finreportr to actually download the header file for the accession number, extract the table with the file descriptions, and select the correct file name on the basis of the description.

I’ve got a bit of momentum here so I’ll try see if it’s a simple fix and make a pull request.

@matthewgson
Copy link

@vsoler's answer on

options(HTTPUserAgent = "your name here   my_name@domain.com")

worked like a charm. Hope this can be seen on the main readme page!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests