Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CPORT (compressed XPORT format) #6

Open
selik opened this issue Oct 22, 2016 · 14 comments
Open

Add support for CPORT (compressed XPORT format) #6

selik opened this issue Oct 22, 2016 · 14 comments

Comments

@selik
Copy link
Owner

selik commented Oct 22, 2016

It seems some archaic FDA submission rules require(d) SAS XPT or CPT-format files. The Aggregate Analysis of ClinicalTrials.gov Database hosts the same data in Oracle "dmp", pipe-delimited text, and SAS CPORT formats. Perhaps we can use these files as a sort of Rosetta stone to infer the specification of the SAS CPT/CPORT format.

@dhanababum
Copy link

Hi, Great Contribution. I need to know some information about CPORT, Currently my client requirement is to read .cpt format files in python. But I'm unable to find the layout format for CPORT like https://support.sas.com/techsup/technote/ts140.pdf(XPORT format). Did you found any information about CPORT format ?. Or any help needed on this ?

  • Thanks in advance.

@selik
Copy link
Owner Author

selik commented Sep 16, 2017

@dhanababum I believe it stands for "Compressed export" or something like that. Unfortunately, we'd have to reverse-engineer it.

The binary CPORT format is not openly documented. The data values in files produced by PROC CPORT can be compressed and the files may be password-protected.

https://www.loc.gov/preservation/digital/formats/fdd/fdd000464.shtml#notes

@smiiil
Copy link

smiiil commented Jan 11, 2021

Hi,
Appreciate all the work on this. I'm also running into an issue opening compressed transport files. Any luck with using Python for CPORT files?

@selik
Copy link
Owner Author

selik commented Jan 11, 2021

@smiiil Sorry, I haven't gotten around to it, and I don't expect to for a while. I'm happy to coach you through it, though.

@smiiil
Copy link

smiiil commented Jan 11, 2021

Sure, willing to help.

@selik
Copy link
Owner Author

selik commented Jan 11, 2021

My design idea was that the cport module could extend classes from the v56 module, trying to reuse as much of the logic as possible.

Unfortunately, there seem to be some bugs in the latest version, so maybe it's best to start by fixing those, which'd get you familiar with the logic. The decision to extend Pandas made the code much more complex. Hopefully it made the API more pleasant, but I've started to worry that it was a mistake.

@cmdugan13
Copy link

Thanks for all of your work @selik! Following this thread since I also am running into the issue with CPORT files.

@selik
Copy link
Owner Author

selik commented Jan 19, 2022

@cmdugan13 Is the CPORT file you're trying to read publicly available?

@cmdugan13
Copy link

It is-- I can't link the file, but it's C2419P1M.XPORT in the attached folder
2021 Midyear-Final-Model Software.zip
It is on CMS's website, if you need the source

@selik
Copy link
Owner Author

selik commented Jan 19, 2022

I'll take a look next weekend / late January.

@selik
Copy link
Owner Author

selik commented Feb 14, 2022

This is going to be tricky. SAS Universal Viewer doesn't support CPORT files. Apparently the universe is smaller than we thought. https://support.sas.com/kb/42/356.html

@lscott15
Copy link

I found some sample datasets that CMS published in 2014 that are available both as TRN and as TXT files if it helps:
STDIAG.TRN
STDIAG.TXT

@selik
Copy link
Owner Author

selik commented May 12, 2022

@lscott15 Thanks for the tip. I'll check it out.

@thekevshow
Copy link

Was there any progress made on this? I am also willing to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants