Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path problems with system calls on Windows #6

Closed
Phenomniverse opened this issue Nov 22, 2022 · 30 comments
Closed

Path problems with system calls on Windows #6

Phenomniverse opened this issue Nov 22, 2022 · 30 comments
Assignees
Labels
bug Something isn't working

Comments

@Phenomniverse
Copy link

Hi,
I'm struggling to get read_chroms to accept a path_out, and its unclear what formatting it is looking for. It seems to want to add a "/" to the front of whatever I specify as path_out. The default behaviour is supposed to save it in the current working directory, but it doesn't do that either, it prompts whether to save it in 'temp', but it can't find 'temp' folder even when I manually created it as a subfolder of the working directory. I would prefer to give the full path from the drive name (eg "C:/ ... "), but this doesn't work because its putting a "/" in front of the drive name. See below where I've used getwd() to provide the path to the current directory.

read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=getwd(), export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')

Error in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  : 
  The export directory '/W:/ARL/Analytical/OPERATOR METHOD TEMPLATES/chemstation data/' does not exist.

If I make path_out = "", it ignores this problem at least for long enough to encounter an additional problem, which is that it can't find the path to the OpenChrom command line. It seems to want the pathname with filename, sans extension, eg "C:/Users/ ... /Programs/OpenChrom/openchrom" for this. If I type this in, it moves on to another error (I suspect its back to the first error). It seems to save this path in the path_to_openchrom_commandline.txt file, but it doesn't seem to be able to find OpenChrom unless I manually type it in each time. When I do so, I get:

Error in write_xml.xml_document(x, file = path_xml) : Error closing file
In addition: Warning message:
In write_xml.xml_document(x, file = path_xml) : Permission denie [1501]
@ethanbass
Copy link
Owner

ethanbass commented Nov 22, 2022 via email

@ethanbass
Copy link
Owner

ethanbass commented Nov 22, 2022

Regarding the OpenChrom path, I think on Windows you need the full path to the .exe file with the extension. Unfortunately, it is not totally consistent between operating systems. I should probably update the docs to reflect this more clearly. Once you provide the path, it should be saved so that you don't have to provide the path again when you reopen the package, as you don't move or delete your OpenChrom installation.

@ethanbass ethanbass added the bug Something isn't working label Nov 22, 2022
@ethanbass ethanbass self-assigned this Nov 22, 2022
@Phenomniverse
Copy link
Author

Thanks Ethan I'll try it out tomorrow when I'm back in my office. Would you mind giving me an overview of the command line arguments that OpenChrom requires to convert a csd file to csv? I couldn't find any OpenChrom documentation on this point.

@ethanbass
Copy link
Owner

ethanbass commented Nov 22, 2022

What you have looks good for converting just one file.

read_chroms(paths = <>, find_files=FALSE, path_out=getwd(), export=TRUE, parser="openchrom", format_in='csd', export_format = 'csv')

Theoretically in the latest version you shouldn't need the find_files argument either, but it can't hurt to be explicit. I just added a new set of parsers from the rainbow package for python that should theoretically work at well for Agilent, but i haven't tested them out on Windows yet. You should be able to call this from read_chroms also as follows:

read_chroms(paths = <>, find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')

You may need to run configure_rainbow() first. It's supposed to install automatically, but seems to not work so well on Windows.

@ethanbass
Copy link
Owner

ethanbass commented Nov 22, 2022

Oh you meant for OpenChrom, sorry. You need to provide an xml batch file to the OpenChrom command line to access the parsers. Then you run openchrom -cli -batchfile <path_to_batchfile>. The main work that my R function is doing under the hood is to write the batch file. It has to include all the paths to the files you want to convert at "InputEntries" and then it has ProcessEntries with the commands you want OpenChrom to carry out. The ProcessEntry you need to convert csd to csv is csd.export.org.eclipse.chemclipse.csd.converter.supplier.csv. You can look at the source code for the call_openchrom function to see how it is constructing the batch files (https://github.com/ethanbass/chromConverter/blob/master/R/call_openchrom.R). It's not very well documented anywhere as far as I've been able to find.

@ethanbass
Copy link
Owner

ethanbass commented Nov 22, 2022

(Maybe I should split the batch file constructor code off into a separate function in case people want to call that separately to just make the batch files). I think that might make for cleaner code anyway

@ethanbass
Copy link
Owner

Also, If you'd be open to sharing one of those FID files with me that would be helpful for testing the package. I don't think I have any FID files from Agilent in my little collection yet

@Phenomniverse
Copy link
Author

I just pushed a patch to the main branch that I think should solve this issue (070f297). It would be great if you can test it for me

I reinstalled the package from github but the path_out issue still occurs. The error message no longer includes the path_out value, so I can't confirm whether this is because of the leading slash still being present or if there is another issue.

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=data_dump,export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.

Not specifying a path_out results in being prompted to accept export to 'temp' directory, which also doesn't work:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE,export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Export directory not specified! Export files to `temp` directory (y/n)?y
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.

I just added a new set of parsers from the rainbow package for python that should theoretically work at well for Agilent, but i haven't tested them out on Windows yet.

I attempted this (I ran the configure_rainbow() function first, but I don't think it was necessary, I think that ran automatically when I loaded the package). This is the result:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
Warning in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  :
  Error in converter(file) : could not find function "converter"

The following chromatograms could not be interpreted: 1
list()

(Maybe I should split the batch file constructor code off into a separate function in case people want to call that separately to just make the batch files). I think that might make for cleaner code anyway

Sounds like a good idea to me.

Also, If you'd be open to sharing one of those FID files with me that would be helpful for testing the package. I don't think I have any FID files from Agilent in my little collection yet

I should be able to find a chemstation data file that is okay to share. What's the best way to get it to you?

@ethanbass
Copy link
Owner

ethanbass commented Nov 23, 2022

Ahh sorry this still isn't working. Thank you for the detailed report!

I attempted this (I ran the configure_rainbow() function first, but I don't think it was necessary, I think that ran automatically when I loaded the package). This is the result:

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
Warning in read_chroms(paste0(archive_sample_dirs[1], "/FID1A.ch"), find_files = FALSE,  :
  Error in converter(file) : could not find function "converter"

The following chromatograms could not be interpreted: 1
list()

I'm pretty sure I know why this wasn't working and just patched it. Fixing the path issue on windows is going to be a bit trickier. I'll need to spend some time troubleshooting on the Windows box in the lab which I won't be able to do until next week because of the holiday. I clearly introduced a bug somewhere but I can't quite figure out what the problem is. The back slashes in the windows paths make a lot of problems in R...

If you wanna to just email me an example file (at ethanbass@gmail.com) that would be great. I can confirm for you if you want that rainbow is able to read it.

@ethanbass
Copy link
Owner

ethanbass commented Nov 23, 2022

So I actually messed around with the paths a little more just now and pushed another update to fix the way the paths are parsed on Windows. I have the OpenChrom parser working again on my Windows 10 computer in the lab. Might be worth another look when you get a chance. I also confirmed that the rainbow parser seems to be running smoothly now at least on my installation of Windows 10.

@Phenomniverse
Copy link
Author

Ok so this is a bit weird but the path_out option seems to work for the rainbow parser, but not the openchrom one. However, the rainbow parser is returning null rather than collecting the data from the .ch file. But it is successfully creating the empty .csv file in the working directory. So that's progress. Another issue that I forsee when I get the parser to work properly is that the output .csv file is named as per the input .ch file, but if read_chroms is iterating over a list of input files that all have the same name (in different directories), I'll only end up with a .csv of the last .ch file evaluated. Chemstation saves its raw data as FID1A.ch by default and I have a lot (thousands) of these files so its not ideal to be changing the input file names.

>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'),find_files=FALSE, path_out=getwd(),export=TRUE, parser="openchrom", format_in='csd',export_format = 'csv')
Error in (function (files, path_out, format_in, export_format = c("csv",  : 
  'path_out' not found. Make sure directory exists.
>   read_chroms(paste0(archive_sample_dirs[1],'/FID1A.ch'), find_files=FALSE, path_out=getwd(), export=TRUE, parser="rainbow", format_in='chemstation', data_format = "wide", export_format = 'csv')
$FID1A
NULL

@Phenomniverse
Copy link
Author

If you wanna to just email me an example file

Email sent :)

@ethanbass
Copy link
Owner

Thank you for the update and for sending along the files. I'm glad to hear that the rainbow parser is running for you now (even though it isn't actually reading your files). Unfortunately, I checked out your files and it doesn't seem like any of the parsers that are currently included with the package are able to read them properly. I know Roderick Bovee who develops the entab package is working on an update for the agilent FID parser (bovee/entab#42) and it seems like the rainbow developers might also be interested in your file (since they have a parser for agilent FID files that is throwing an error on your file). Do you know what version of Chemstation your files are created by? With your permission, I'd be happy to pass your files along to Roderick and the rainbow people and they might be able to update the parsers to properly read your files.

Regarding the path issue with OpenChrom, I'm pretty perplexed and don't really understand why this is happening since it's working fine on the other Windows computer I have access to. I will have to look into this further.

Regarding the file name issue, if you have the files in the original .D directory you can change format_in to agilent_d and it will read in the filenames from the directory name. This is still a good point though. I may think about adding an additional argument to read the directory names when using the format_in = chemstation or at least have the parser read the directory names into the metadata for the file. I will have to think a little more about how to best implement this.

Thank you for all the feedback and please let me know if it would be OK to share the files.

@ethanbass
Copy link
Owner

It seems that your file is from B.04 chemstation and maybe the rainbow parser can only handle files from b.03 chemstation

@Phenomniverse
Copy link
Author

I'll confirm the chemstation version when I'm back in the office tomorrow. It's possible we have different versions on different machines so I might be able to try more than one version. I will email you re: sharing FID files.

@ethanbass
Copy link
Owner

OK sounds good. I am making some progress by the way figuring out the binary. I don't usually do this myself but I decided to try and I am getting back a chromatogram that looks a lot like the one you sent me, so I'm pretty sure it shouldn't be too hard to write a parser directly or R or adapt the one in the rainbow package

@ethanbass
Copy link
Owner

I added a new parser in ba59590 to read these agilent FID files natively in R. It seems to work for both the 179 type files and 181 type files. If anyone reading this has a 180 type file to see where it falls along this spectrum that would be helpful.

@Phenomniverse
Copy link
Author

Ok, our GCs are mostly running various iterations of Chemstation B.04.03, but we do have a couple on OpenLab CDS Chemstation Edition, and a couple on a newer version of OpenLab. We also have an GCMS running MSD Chemstation D.02.00.275.

We have GCFID data going back to 2013, and maybe earlier, which I suspect has been collected using older versions of chemstation.

Your parser (parser="chromconverter", format_in='chemstation_fid') works pretty well for more recent files, it doesn't work so well with the 2013 file that I tried.

With the newer files, the actual chromatogram data is good, which is the important bit, but some of the attribute fields that read_chroms generates are a bit off. For my purposes this information is more readily accessed through the other files stored in the .D folder anyway (eg .xls, .xml, .txt files, depending on what was specified in the chemstation acquisition method).
But for further troubleshooting if you want to delve into it, the read_chroms attribute called 'notebook' is actually the sample name. The attribute called "parent.directory" is the operator's name as entered in chemstation when the sequence is started. And the "instrument" attribute seems to be populated by a concatenation of the acquisition method name and a string ("GCI\002GC\024" in the file I looked at) which doesn't related to the instrument in any way that is obvious to me.
I note that when outputing to .CSV, read_chroms is now saving the file as per the directory name, which is helpful,

Regarding the older files (tested on one from 2013 and another from 2014), the parser isn't working as expected. The rt (x-axis) data seems okay, but the value (y-axis) data is very strange. See image below (I've ploted as points because the line plot just fills the whole screen with black). Also, the attributes fields contain a lot of unreadable characters. I'm not sure what version of chemstation these older files were generated by, although maybe its contained in the software field, if it could be interpreted. The newer file just had "Asterix ChemStation" in that field though.

attr(,"version")
[1] "181"
attr(,"file.type")
[1] "GC DATA FILE\022\003\005Èå\022\022\003\005!\u009dY|´"
attr(,"notebook")
[1] "blue cypress 111wÿÿÿÿ¸é\022ÐÁüw\030\aóNì\022ÿÿÿÿ©\024xBLUE CYPRESS.M\022ø£S\020Àì\022Üõ\022äè\022hè\022\003Œ*øwó\030\aó\003ÐH\003\b@è\022€w0ê\022U\037øwÐ*øwÿÿÿÿ@ê\022ÐÁüw\030\aóÿÿÿÿ©\024xDS\\BLUE CYPRESS.M\\\022H\003\001v\r"
attr(,"parent.directory")
[1] "M_GC-3ú¢ÂiwNì\022ç\016ö\022áw¢î\022lû\022\037Ð\020Pÿÿÿÿ,ì\022\bs\020P@„î\022\001¸î\022Ð0=@tì\022\003Œ*øw€ì\022\003Œ*øwó\030\aó\003ÐH\003\bXì\022€wHî\022U\037øwÐ*øwÿÿÿÿXî\022ÐÁüw\030\aóä\"UXŽV"
attr(,"run_date")
[1] "23/04/2014 11:49:59 AMlû\022j½xó"
attr(,"instrument")
[1] "HP G1530A\003H\003\b\03000\027\002GC\022Ðí\022ÐH\003\bœî\022\023¬î\022ÐWøwìWøw\016BLUE CYPRESS.M|\023¼\001\v\001XŽV\003¸\001\v\001\001\u0090î\022\002lû\022D\037\\|¨!W|ÿÿÿÿ4ï\022ä\"U\"cBÐï\022ä\"Ux\004À\021\bàáÃ\001\004(ï\022"
attr(,"method")
[1] "BLUE CYPRESS.M|\023¼\001\v\001XŽV\003¸\001\v\001\001\u0090î\022\002lû\022D\037\\|¨!W|ÿÿÿÿ4ï\022ä\"U\"cBÐï\022ä\"Ux\004À\021\bàáÃ\001\004(ï\022x>Ö\030,¼\001\v\001´\001\v\001\020@<ï\022ð?@ï\022\001ð?\001Tï\022 ¤ãwÈ3\02400\027\001"
attr(,"software")
[1] "\v\001\002œ\v\001ÄC\001ý\a\002\024È/\024œ\v\001Ä\177\002è2ý\a¨3\024\001\177\016\177\002hô\022\034G£\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\001ˆ’C\0013\021e<\b\v\001\024ò\022çäX|ëáX|q4e0ú\0248÷\001\bèñ\022\003\002\022\a\004@aø\a\t\004\021v\024ÝáwØ\u0090\023Pñ\022\035Œ*øwóø\vó\0350Õü\a(ñ\022\026\030ó\022U\037øwÐ*øwÿÿÿÿ(ó\022ÐÁüwø\vó \005M$êø\a>\003\003Í«ºÜ\u0090ñ\022Akáw\024ÝáwÆ\004S\021v \005M€_Œ\b€_Œ\b°ñ\022sPâw\024ÝáwÆ\004S\021v \005MÐñ\022æŒÿv\024ÝáwÆ\004S\021v \005M"
attr(,"unit")
[1] "pA"
attr(,"signal")
[1] ""
attr(,"time_range")
[1] 2.291713e-05 3.000312e+01
attr(,"data_format")
[1] "long"
attr(,"parser")
[1] "chromConverter"

image

@Phenomniverse
Copy link
Author

Further to the above, the chromconverter parser works well for FID data generated on OpenLab CDS Chemstation Edition C.01.

@ethanbass
Copy link
Owner

Thanks for sending the additional files and doing all this testing. I actually have a function in the package already that reads the chemstation XLS files and attaches some of the metadata. I forgot to "activate it" for the new parser, but probably that would be worth doing. I will try to fix up the attribute fields from the binary files as well when I have time, thanks for the tips.

I'm not sure what's going on with the other chemstation files you sent yet. They seem to be encoded differently than the newer files even though they're also the "181"-type files

@ethanbass ethanbass changed the title Required path formating unclear (path_out, path to 'OpenChrom') Path problems with system calls on Windows Dec 3, 2022
@ethanbass
Copy link
Owner

@Phenomniverse I just updated the agilent FID parser to a new version that can read the older 181 files you sent me. (06a80bf). Give it a whirl and please let me know if you have any other FID files that aren't being translated correctly.

@Phenomniverse
Copy link
Author

Hi @ethanbass happy new year! Great to know that you're still working on this!

Ok, so I tested the latest update against about a dozen random GCFID files dating back to ~2004, and it appears to be working nicely, I haven't found any that it doesn't work for yet, but will keep you updated if I do. I did have to tweak my code a little bit because it appears that the structure of the output from read_chroms has changed a little bit with this latest update. Previously I was able to convert the read_chroms output to a dataframe and it would have two columns capturing the x- and y-axis values respectively. Now it seems that the x-axis data is appearing as the row names when converted to a data frame. This isn't a big problem, I can change it back to the format that the rest of my code expects with one line. Just curious as to why the change in the structure of the read_chroms output?

@ethanbass
Copy link
Owner

ethanbass commented Jan 15, 2023

Glad it seems to be working. Sorry for the unexpected change re: the format of the data.frames. This is still not completely consistent between different parsers wrapped by the read_chroms function. The way I have it now is actually more consistent with the output of most of the other parsers -- the way you were accustomed to with the two columns was more of an outlier. I could consider maybe adding an additional option to generate data.frames in the two column format, if you think it would be helpful.

@Phenomniverse
Copy link
Author

No its all good. The output read_chroms generates has the advantage of including the meta data from the FID file, which I lose as soon as I turn it into a data frame with two columns only. It's only for the sake of my existing code that I am converting it into a data frame. Thanks again for your efforts on this project!

@ethanbass
Copy link
Owner

ethanbass commented Jan 15, 2023

Thank you as well for all your helpful feedback! By the way, the data.frame generated by read_chroms should actually contain the same metadata as the matrix version, but it doesn't automatically get printed to the console for whatever reason. You should be able to access it using attributes().

@Phenomniverse
Copy link
Author

read_chroms outputs as a list:

fid <- read_chroms("FID1A.CH",parser="chromconverter",format_in="chemstation_fid")

typeof(fid)
[1] "list"

str(fid)
List of 1
 $ FID1A.CH: num [1:36001, 1] 22.7 22.7 22.7 22.7 22.7 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:36001] "-0.000584383328755697" "0.000248926827725658" "0.00108223698420701" "0.00191554714068837" ...
  .. ..$ : chr "Intensity"
  ..- attr(*, "version")= chr "181"
  ..- attr(*, "sample_name")= chr "Wild Crafted Buddha Wood Oil Batch 52"
  ..- attr(*, "run_date")= chr "07-Sep-22, 17:41:36"
  ..- attr(*, "instrument")= chr "GCI"
  ..- attr(*, "method")= chr "EO BASEMETHOD 2020.M"
  ..- attr(*, "software")= chr "Asterix ChemStation"
  ..- attr(*, "unit")= chr "pA"
  ..- attr(*, "signal")= chr ""
  ..- attr(*, "time_range")= num [1:2] -0.000584 29.998581
  ..- attr(*, "data_format")= chr "long"
  ..- attr(*, "parser")= chr "chromConverter"

I turn it into a dataframe using:

fid_df <- as.data.frame(fid)
fid_df<-setDT(fid_df,keep.rownames=TRUE)[]
colnames(fid_df)<-c('rt','value')

The setDT function from data.table package is converting the rownames into a column here.
I lose the attributes data in converting it to a data frame, but I can always access it from the original output if I need it.

Maybe a better way to do it would be:
fid_df<-data.frame(rt=rownames(fid$FID1A.CH), value=fid$FID1A.CH[1])

I suppose I could pass the attributes from fid$FID1A.CH to fid_df as a comment, but there's no real need for it.

@ethanbass
Copy link
Owner

ethanbass commented Jan 16, 2023

You could do something like this if you want to transfer the metadata over:

fid_df <- lapply(fid, function(xx){
  # convert to data.frame
  x_new <- data.frame(rt=rownames(xx), value=xx)
 # transfer metadata
  mostattributes(x_new) <- attributes(xx)
  # if you don't want rownames include the following line
  rownames(x_new) <- NULL
  x_new
})

@Phenomniverse
Copy link
Author

yeah okay, playing around with this makes me realise that retaining the metadata is pretty helpful.
I notice that the metadata for some of those older FID files contains some strings of nonsense characters although the actual metadata is in the strings as well. For example, see below, where some of the attribute fields are normal and others contain extraneous characters :

$ attributes:List of 11
  ..$ version    : chr "181"
  ..$ sample_name: chr "REF TTOØ\022\024\002\002‘|E"
  ..$ run_date   : chr "12/12/2018 12:03:24 AM"
  ..$ instrument : chr "HP G1530AÀ&$÷j‘|\030ß\022"
  ..$ method     : chr "REF-TTO.MÐý\177àý\177Ðý\177:4@\004x(e$Hß\022"
  ..$ software   : chr "‰$‘|?'‘|\002\a@u\002\a$á\022\a$‘|\177'‘|\002\a´à\022\034á\022Dá\022\235'‘|X \003@u\002\a$á\022n \200|\002\a@u\0"| __truncated__
  ..$ unit       : chr "pA"
  ..$ signal     : chr ""
  ..$ time_range : num [1:2] 1.82e-04 3.00e+01
  ..$ data_format: chr "long"
  ..$ parser     : chr "chromConverter"

@ethanbass
Copy link
Owner

ethanbass commented Jan 16, 2023 via email

@ethanbass
Copy link
Owner

@Phenomniverse FIY this issue with the metadata in the chemstation 181 files should be resolved in the latest version (v0.4.0). Also, you can now toggle the format of the data.frames using the data_format argument: wide format (the default) will return retention times as row names while long format will return retention times in their own column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants