Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better export for statistics tools #496

Closed
mitchellsundt opened this issue Jul 9, 2015 · 10 comments
Closed

Better export for statistics tools #496

mitchellsundt opened this issue Jul 9, 2015 · 10 comments

Comments

@mitchellsundt
Copy link
Contributor

mitchellsundt commented Jul 9, 2015

Migrated to getodk/aggregate#31 by spacetelescope/github-issues-import

Originally reported on Google Code with ID 495

I am also severely missing a possibility to export data to statistical programmes such
as Stata or R including variable labels and value labels, i.e. essentially the underlying
logic of the questionnaire. Before, we were using pencil paper questionnaires and the
free (but not open source) CSPro as data entry tool. It was possible to directly export
to Stata, which was extremely handy. Any change in the questionnaire was immediately
and correctly reflected in the Stata export. As any digital questionnaire includes
the information of (a) the question, (b) variable name, (c) answer categories in words
and (d) associated values in numbers. As far as I understood, (a)-(c) would be lost
in ODK .csv export and only (d) is kept. Is that correct? IMHO, this would mean much
(error-prone) duplication of work. 

How difficult is it to include (a)-(c) and to develop an export to statistical programmes?
Has already anyone started to write an export filter to one of these statistical programmes?

--Gerry

Reported by yanokwa on 2012-01-06 00:01:02

@mitchellsundt
Copy link
Contributor Author

mitchellsundt commented Jul 9, 2015

Migrated to opendatakit/aggregate#31 (comment) by spacetelescope/github-issues-import

For us to implement this we would have to go research the proper formats. You could
help us with this by filing an issue with links to the format specification and a couple
of examples (can contain fake data) that show the original xform used to gather data,
the original csv export along with a file that shows what the desired output format
that combined them. The issue tracker allows you to attach files.

-- Waylon

Reported by yanokwa on 2012-01-06 00:02:36

@mitchellsundt
Copy link
Contributor Author

mitchellsundt commented Jul 9, 2015

Migrated to opendatakit/aggregate#31 (comment) by spacetelescope/github-issues-import

Thank you very much for opening this issue in order to have an export of data to statistical
programmes, such as Stata or R, including variable and value labels.

I contacted the Stata developers and they were sending me the links to the specification
of the Stata .dta file format:
* 8/9:   http://www.stata.com/help.cgi?dta_113
* 10/11: http://www.stata.com/help.cgi?dta_114
* 12:    http://www.stata.com/help.cgi?dta

Thanks for looking into it.
--Gerry

Reported by gerry.treppel on 2012-01-08 12:57:04

@mitchellsundt
Copy link
Contributor Author

Some versions ago, Stata introduced an improved mechanism to import/export Stata data
via an XML based mechanism. They call it "Stata dta XML". Since ODK makes extensive
use of XML, this seems to me the best and easiest method to create import/export filters
for Stata.

I asked the Stata support whether there is any limitation of the Stata-dta-XML versus
the binary Stata-dta file format. Here is their reply:
"The Stata XML format shares the same limitations as the Stata DTA format.
There is not much documentation for the Stata XML format because it is really
just the DTA format written out into XML." This means the descriptions as given in
comment 2 are also helpful for and export to Stata XML.

Reported by gerry.treppel on 2012-01-17 22:06:04

@mitchellsundt
Copy link
Contributor Author

Reported by mitchellsundt on 2012-01-30 23:59:50

  • Labels added: Type-Enhancement, Aggregate, Future
  • Labels removed: Type-Defect

@mitchellsundt
Copy link
Contributor Author

Our team is still very interested in having such a feature.

As to Yaw's request in the 2nd comment, I will attach an xform questionnaire (questionnaire.xml),
filled out instance (instance.xml), exported results (result.csv) and a stata compliant
xml file that contains value labels and the original questions as variable labels (labels.xml).
What we are looking for is something in the direction of the last one (I am no expert
there may be many ways to do this more appropriately).

I did not fill out the form in collect, but just wrote the files so there might be
mistakes.

If this or sth in this direction, will never be an official feature of aggregate, our
team would consider commissioning a customized version.

Thanks

Olivier

Reported by o.kalmus on 2012-02-15 16:50:55


- _Attachment: [questionnaire.xml](https://storage.googleapis.com/google-code-attachments/opendatakit/issue-495/comment-5/questionnaire.xml)_ - _Attachment: [instance.xml](https://storage.googleapis.com/google-code-attachments/opendatakit/issue-495/comment-5/instance.xml)_ - _Attachment: [result.csv](https://storage.googleapis.com/google-code-attachments/opendatakit/issue-495/comment-5/result.csv)_ - _Attachment: [labels.xml](https://storage.googleapis.com/google-code-attachments/opendatakit/issue-495/comment-5/labels.xml)_

@mitchellsundt
Copy link
Contributor Author

Reported by yanokwa on 2012-05-25 15:46:40

@mitchellsundt
Copy link
Contributor Author

Reported by yanokwa on 2012-05-26 01:10:54

@mitchellsundt
Copy link
Contributor Author

Hello! This would be a fantastic feature.  I wanted to know if this will be implemented.

Reported by patelmm79 on 2013-08-12 22:09:13

@mitchellsundt
Copy link
Contributor Author

The core team does not have time to implement this.  

I am unclear whether this should be part of an automated publisher (Aggregate) for
a specific statistics server (?), or part of the XLSForm tool (i.e., XLSForm would
generate multiple output files).

If I understand these packages, I believe it would be a change to XLSForm?

If someone commissions the writing of these features, we can fold those changes into
the main tree.


Reported by mitchellsundt on 2013-08-13 00:04:06

  • Labels added: XLSForm

@mitchellsundt
Copy link
Contributor Author

As I see it, it would be part of Aggregate for a specific statistics server.  The file
that Aggregate would output combines the data & syntax / structure into one file.

Reported by patelmm79 on 2013-09-17 16:36:44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants