improve I/O tools for working with data

### Importing data from a file

We want a `read()` function for the command-line, which reads data from the given file and transforms it into an easily processed data structure for use in webppl functions. We expect this to be used primarily for csv files, but it would be nice to support other input types as well. 

The call would be `read(filename, [opts])`, where `opts` is an object containing any relevant parameters.

Here are some file types we might want to consider:
- **CSV**: Needs to properly get the types of numeric values, strings, and booleans. `opts` could specify the delimiter (default to comma), whether there's a header (default to true), and so on. We could implement the internals using [nodeCSV](https://www.npmjs.com/package/csv) or [babyparse](https://github.com/Rich-Harris/BabyParse). 
  
  We thought the best output would be a list of objects, indexed by header names, with each element in the list corresponding to a row of the input csv. This would support post-processing steps like filtering, plucking a single column, and omitting irrelevant values. 

```
    [{headerLabel1 : value1,
      headerLabel2 : value2,
      headerLabel3 : value3,
      ...},
     {...}]
```
- **JSON**: Could be implemented using a combination of fs and JSON.parse:

```
    var data = JSON.parse(fs.readFileSync('file', 'utf8'));
```
- **text**: For NLP applications, sometimes you just want to pull in a bunch of text. `opt` could take the encoding to use.
- **images**: For vision applications, it might be cool to be able to read in a .png or .jpg and parse it into a matrix of pixels. Could be implemented using [node-opencv](https://github.com/peterbraden/node-opencv).

We can determine the file type by using the file extension. If it is not .csv, .tsv, .json, or in the set of image encoding we support, it could default to text. 
### Writing ERPs to file

We want a `write()` function that will write out one or more ERPs to a file for post-processing and analysis in other languages like R or python. We want to write to a .csv in long-form. There are two major issues to consider here:
1. The results of many model-fitting exercises are ERPs with lists of lists or lists of objects in the support. To put this in long-form, we need to write one line for each of these internal lists or objects, which share the same probability. We'd also like to be able to support multiple sets of object keys: one element in the support may be `{type : 'a', alphaVal : .5}` and another may be `{type: 'b', betaVal : 1.5}`. We'd like to write a csv with 'type', 'alphaVal' and 'betaVal' as column headers and NAs filled for rows where they aren't specified.
2. For model comparison applications, we often want to write multiple ERPs to the same file with one or more labels identifying which ERP is which. To put this in long-form, we need to prepend these labels to each row. We also need to be able to specify whether we're going to append to the given file or write a new one.

It seems like the right call for this is to first create a writer object:

```
var myWriter = csvWriter(filename, mode)
```

where `mode` can either be `w` (write) or `a` (append). This is kind of copying the python way of doing things. Then pass this writer object into the `write` function with an object setting various options:

```
write(myWriter, opts)
```

The two options we had in mind are (1) `parameterHeaders`, which specifies which headers will be found in the list of lists or list of objects, and (2) `additionalLabels`, the list of labels to prepend to each line of the csv.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve I/O tools for working with data #146

Importing data from a file

Writing ERPs to file

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

improve I/O tools for working with data #146

Description

Importing data from a file

Writing ERPs to file

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions