Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No filings available for given year #1

Open
jsta opened this issue Nov 14, 2016 · 18 comments
Open

No filings available for given year #1

jsta opened this issue Nov 14, 2016 · 18 comments

Comments

@jsta
Copy link

jsta commented Nov 14, 2016

My run of the example README.md code for GetIncome, GetBalanceSheet, and GetCashFlow
does not produce the expected output. For example, running GetIncome("GOOG", 2015) produces:

Error in GetAccessionNo(symbol, year, foreign = FALSE) :
no filings available for given year

openjournals/joss-reviews#119

@sewardlee337
Copy link
Owner

Thank you for the catch. The example in the README is outdated because of GOOG's change in corporate structure in 2015 -- the SEC website no longer hosts GOOG's 2015 annual report.

I have changed examples in README.md to 2016 accordingly (e.g. GetIncome("GOOG", 2016)).

@jsta
Copy link
Author

jsta commented Nov 15, 2016

GetIncome("GOOG", 2016) appears to download files to a XBRLcache folder but returns the following error:

Error in fileFromCache(file) :
Error in download.file(file, cached.file, method = "auto", quiet = !verbose) :
cannot download all files

In addition: Warning message:
In download.file(file, cached.file, method = "auto", quiet = !verbose) :
URL 'https://www.sec.gov/Archives/edgar/data/1652044/000165204416000012/goog-20151231_pre.xml': status was '404 Not Found'

@jsta
Copy link
Author

jsta commented Nov 15, 2016

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] finreportr_1.0.1     devtools_1.12.0.9000

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7        XML_3.98-1.5       digest_0.6.10      dplyr_0.5.0.9000  
 [5] withr_1.0.2        assertthat_0.1     XBRL_0.99.17       R6_2.2.0          
 [9] DBI_0.5-1          magrittr_1.5       httr_1.2.1         stringi_1.1.2     
[13] lazyeval_0.2.0     curl_2.2           xml2_1.0.0.9001    tools_3.3.2       
[17] stringr_1.1.0      selectr_0.3-0      pkgload_0.0.0.9000 rvest_0.3.2       
[21] memoise_1.0.0      tibble_1.2

@sewardlee337
Copy link
Owner

Thanks! I will look into it and try to reproduce this error to figure out what is going on.

@sewardlee337
Copy link
Owner

sewardlee337 commented Nov 16, 2016

I've tried it on Windows and Ubuntu, and it seems to work on my end.

sessionInfo

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] finreportr_1.0.1 devtools_1.12.0 

loaded via a namespace (and not attached):
 [1] magrittr_1.5   R6_2.2.0       assertthat_0.1 DBI_0.5-1      tools_3.2.3    withr_1.0.2    dplyr_0.5.0    tibble_1.2    
 [9] curl_2.2       Rcpp_0.12.7    memoise_1.0.0  digest_0.6.10 

@jsta - Could you try deleting the XBRLcache folder and try running GetIncome("GOOG", 2016) again? If that still does not work, could you send me a list of file contents in the folder, so that I may see what's going on?

@jsta
Copy link
Author

jsta commented Nov 17, 2016

Still getting an error. I tracked down the source of the error to the line in the GetInstFile function calling the XBRL::xbrlDoAll function. I cannot even run the examples from the man page for XBRL::xbrlDoAll. It fails with the same error.

@sewardlee337
Copy link
Owner

xbrlDoAll calls a function in the XBRL package.

Can you try running XBRL::xbrlDoALL('https://www.sec.gov/Archives/edgar/data/1288776/000165204416000012/goog-20151231.xml')? This will help me determine if the problem is in the XBRL package.

(This function from the XBRL package will download a lot of files, which you may want to delete afterwards.)

@jsta
Copy link
Author

jsta commented Nov 17, 2016

Thats what I was saying in my earlier comment. I am having trouble with the XBRL package itself. Your example gives me the same error message as I originally reported. Too bad XBRL doesn't have an official Github page to report issues.

@jsta
Copy link
Author

jsta commented Nov 17, 2016

Why is XBRL not in your sessionInfo() under loaded via a namespace?

@sewardlee337
Copy link
Owner

After I run GetIncome(), XBRL appears under loaded via a namespace:

R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] finreportr_1.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7    XML_3.98-1.5   dplyr_0.5.0    assertthat_0.1 XBRL_0.99.17   R6_2.2.0       DBI_0.5-1      magrittr_1.5  
 [9] httr_1.2.1     stringi_1.1.2  curl_2.2       lazyeval_0.2.0 xml2_1.0.0     tools_3.2.3    stringr_1.1.0  selectr_0.3-0 
[17] rvest_0.3.2    tibble_1.2 

I am corresponding with the author of the XBRL package directly via email to get some insight into why GetIncome() doesn't run properly in your R session...

@sewardlee337
Copy link
Owner

The author of XBRL also runs Ubuntu 16.04. His only suggestion:

Maybe running an:
update.packages()
may help.

If it works on some computers and R sessions, but not others, I suspect a settings or configuration problem. Will continue to investigate.

@sewardlee337
Copy link
Owner

@jsta - I am still trying to diagnose why the XBRL package does not work on your computer/R session.

Can you type the following into your terminal, and show me the output?

curl -I -v https://www.sec.gov/Archives/edgar/data/1652044/000165204416000012/goog-20151231_pre.xml

I am hoping this will give me more information to diagnose the underlying issue.

@jsta
Copy link
Author

jsta commented Nov 27, 2016

I think I solved the downloading issue using the fix here. Unfortunately, now I'm getting an error with fixFileName and I'm not familiar enough with Rcpp to debug whats going on in xbrlGetSchemaName.cpp.

Using options(error = recover):

Error in if (substr(file.name, 1, 5) != "http:") { :
argument is of length zero

Enter a frame number, or 0 to exit

1: xbrlDoAll(inst, cache.dir = "XBRLcache", prefix.out = "out", verbose = TRUE)
2: xbrlDoAll.R#30: xbrl$processSchema(xbrl$getSchemaName())
3: XBRL.R#115: cat("Schema: ", file, "\n")
4: xbrl$getSchemaName()
5: XBRL.R#110: fixFileName(dname.inst, .Call("xbrlGetSchemaName", doc.inst, PACK

@jsta
Copy link
Author

jsta commented Nov 27, 2016

Ok, it turns out that the download.file issue described here was the problem. Installing XBLR from my fork with devtools::install_github("jsta/XBLR") fixed the problem completely!

head(GetIncome("GOOG", 2016))

Metric Units Amount startDate endDate
1 Revenues usd 55519000000 2013-01-01 2013-12-31
2 Revenues usd 66001000000 2014-01-01 2014-12-31
3 Revenues usd 74989000000 2015-01-01 2015-12-31
4 Cost of Revenue usd 21993000000 2013-01-01 2013-12-31
5 Cost of Revenue usd 25691000000 2014-01-01 2014-12-31
6 Cost of Revenue usd 28164000000 2015-01-01 2015-12-31

@sewardlee337
Copy link
Owner

sewardlee337 commented Nov 28, 2016

@jsta - I've done some research on the Stack Overflow solution. It seems like the issue has to do with the method that download.file() uses to download the XBRL files (see official help page). method = "curl" seems to be what works for you.

I've never had a problem with the default setting method = "auto" after testing the XBRL package on three computers (Windows and Ubuntu), so I suspected that it's an issue with global options settings in your R session. I've tried toggling through different download.file() methods in my options settings, but still couldn't reproduce your bug. For example:

> ### method = "wget"
> getOption("download.file.method")
[1] "wget"
> 
> head(GetIncome("GOOG", 2016))
           Metric Units      Amount  startDate    endDate
1        Revenues   usd 55519000000 2013-01-01 2013-12-31
2        Revenues   usd 66001000000 2014-01-01 2014-12-31
3        Revenues   usd 74989000000 2015-01-01 2015-12-31
4 Cost of Revenue   usd 21993000000 2013-01-01 2013-12-31
5 Cost of Revenue   usd 25691000000 2014-01-01 2014-12-31
6 Cost of Revenue   usd 28164000000 2015-01-01 2015-12-31
>
> ### method = "internal"
> options(download.file.method = "internal")
> head(GetIncome("GOOG", 2016))
           Metric Units      Amount  startDate    endDate
1        Revenues   usd 55519000000 2013-01-01 2013-12-31
2        Revenues   usd 66001000000 2014-01-01 2014-12-31
3        Revenues   usd 74989000000 2015-01-01 2015-12-31
4 Cost of Revenue   usd 21993000000 2013-01-01 2013-12-31
5 Cost of Revenue   usd 25691000000 2014-01-01 2014-12-31
6 Cost of Revenue   usd 28164000000 2015-01-01 2015-12-31
>
> ### method = "auto"
> options(download.file.method = "auto")
> head(GetIncome("GOOG", 2016))
           Metric Units      Amount  startDate    endDate
1        Revenues   usd 55519000000 2013-01-01 2013-12-31
2        Revenues   usd 66001000000 2014-01-01 2014-12-31
3        Revenues   usd 74989000000 2015-01-01 2015-12-31
4 Cost of Revenue   usd 21993000000 2013-01-01 2013-12-31
5 Cost of Revenue   usd 25691000000 2014-01-01 2014-12-31
6 Cost of Revenue   usd 28164000000 2015-01-01 2015-12-31

Therefore, I currently suspect that what's causing problems on your end is some network restriction that specifically you are subjected to.

If this is the case, how should we proceed?

@jsta
Copy link
Author

jsta commented Nov 30, 2016

Hmm, without my XBRL fork I am getting the same error in a Docker instance of this image. I am dialed into a remote server on a separate network.

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux stretch/sid

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] finreportr_1.0.1 devtools_1.12.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 XML_3.98-1.5 digest_0.6.10 dplyr_0.5.0 withr_1.0.2
[6] assertthat_0.1 XBRL_0.99.17 R6_2.2.0 DBI_0.5-1 git2r_0.16.0
[11] magrittr_1.5 httr_1.2.1 stringi_1.1.2 lazyeval_0.2.0 curl_2.3
[16] xml2_1.0.0 tools_3.3.2 stringr_1.1.0 selectr_0.3-0 rvest_0.3.2
[21] memoise_1.0.0 knitr_1.15.1 tibble_1.2

In my opinion the ideal would be to have more tests and see if travis has the same error.

As far as the JOSS review, I understand that there is not much you can do about an error in a dependency. Now that I have it working with my XBRL fork, I will proceed with the review and I suppose get the opinion of the JOSS editor afterwards...

@jibanes
Copy link

jibanes commented Dec 29, 2016

I had the same issue on osx and debian, "fixing" XBRL's method from 'auto' to 'curl' solved this.

@jarjuk
Copy link

jarjuk commented Mar 22, 2017

Hello,

Assuming I have first failed with

xbrl.vars <- xbrlDoAll(inst,verbose = FALSE),

have options(error=recover), and used R debugger to
identify errornous file "https://www.sec.gov/Archives/edgar/data/21344/000002134413000050/ko-20130927.xsd".

Then I get an errors for:

download.file( "https://www.sec.gov/Archives/edgar/data/21344/000002134413000050/ko-20130927.xsd", "apu.tmp" )
download.file( "https://www.sec.gov/Archives/edgar/data/21344/000002134413000050/ko-20130927.xsd", "apu.tmp", method="auto" )
download.file( "https://www.sec.gov/Archives/edgar/data/21344/000002134413000050/ko-20130927.xsd", "apu.tmp", method="libcurl" )

BUT success for

download.file( "https://www.sec.gov/Archives/edgar/data/21344/000002134413000050/ko-20130927.xsd", "apu.tmp", method="curl" )

After some Googlin, I found an issue discussed in http://r.789695.n4.nabble.com/dowload-file-method-quot-libcurl-quot-and-GET-vs-HEAD-requests-td4722037.html

In R 3.2.4, if you ran download.file(method="libcurl"), it issues a
HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
HEAD request first, and then a GET requet. This can result in problems
when the web server gives an error for a HEAD request, even if the
file is available with a GET request.

with NO CAN DO resolution :(

No I don't think there is a way to avoid the HEAD request.

Suggestion: XBLR package should not define method='auto' because 'download.file uses

getOption("download.file.method", default = "auto")

and would allow user to override method at will.

BR,
Jukka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants