Added column-filter functionality #16

reenberg · 2017-08-01T00:52:46Z

<tl;dr> This adds the feature to optionally filter out entire columns and rows based on column data in a .csv file, if you only wan't to create a small table based on a large .csv file.

If the column_filter is not specified or is an empty list, then the table is
not modified. Else the raw_table_list is filtered based on the values in
the column_filter (i.e., column indexes not specified in the filter is removed).

Each element in the column_filter list must be an integer or a dictionary
with at least the key 'col'.

Specifying an integer in the column_filter list makes sure that column
index is kept (first column is index 0 -- python list indexing).

Specifying a dictionary, gives the optional possibility of specifying the
following keys in the dictionary (note: the keys are mutually exclusive and
specifying more than one has undefined behaviour).

filter: filters out (removes) the row, if the content inside this column
doesn't match (exact string matching of the value of this key and the content
of the cell). The value may be a list of strings to be matched.
regex: filters out (removes) the row, if the content inside this column
doesn't match. The value of this key is placed directly into
re.match(pattern, string) as the pattern and the cell value as the
string. Note: Currently we assume that a small amount of regex's is used,
such that we don't have to deal with compiling of regex's, but rely on the
built in caching to handle it for us.

Example: This example won't filter out any column, but it demonstrates the
three different ways that you may specify a column-filter. Just try and
make changes to either one of them, and see how either columns or rows will
be filtered from the resulting table.

``` {.table}
---
caption: "*Bar* table"
markdown: yes
column-filter:
    - 0
    - col: 1
      regex: ".*B|[\\d]"
    - col: 2
      filter: ['C', '3']
---
A,B,C
1,2,3
```

Because by default CSV tables shouldn’t contain markdown syntax

This reverts commit e0d74b7.

Python2 support

because contaminated panflute 1.9.7 and 1.10 has been deleted.

pip no longer support py3.2

If the column_filter is not specified or is an empty list, then the table is not modified. Else the raw_table_list is filtered based on the values in the column_filter (i.e., column indexes not specified in the filter is removed). Each element in the column_filter list must be an integer or a dictionary with at least the key 'col'. Specifying an integer in the column_filter list makes sure that column index is kept (first column is index 0 -- python list indexing). Specifying a dictionary, gives the optional possibility of specifying the following keys in the dictionary (note: the keys are mutually exclusive and specifying more than one has undefined behaviour). - filter: filters out (removes) the row, if the content inside this column doesn't match (exact string matching of the value of this key and the content of the cell). The value may be a list of strings to be matched. - regex: filters out (removes) the row, if the content inside this column doesn't match. The value of this key is placed directly into `re.match(pattern, string)` as the `pattern` and the cell value as the `string`. Note: Currently we assume that a small amount of regex's is used, such that we don't have to deal with compiling of regex's, but rely on the built in caching to handle it for us. Example: This example won't filter out any column, but it demonstrates the three different ways that you may specify a column-filter. Just try and make changes to either one of them, and see how either columns or rows will be filtered from the resulting table. ```{.table} --- caption: "*Bar* table" markdown: yes column-filter: - 0 - col: 1 regex: ".*B|[\\d]" - col: 2 filter: ['C', '3'] --- A,B,C 1,2,3 ```

ickc · 2017-08-01T05:15:27Z

I think we need more discussion on this for the syntax of this. You might try to ask people in Markdown, tables and CSV - Google Groups to see if there's any suggestions there, and/or open an issue here (I'll open one soon). For now I'll put this pull request on hold.

ickc · 2017-08-01T05:22:23Z

And remember to include tests in pull requests. There's 2 kinds of tests here, one is Python unit test that calls the functions and compare the results. Another is to run pandoc directly and see if the output native AST is the same as a predefined one (usually generated automatically and just eyeballing to verify it's doing what it's supposed to do).

sergiocorreia · 2017-08-04T02:52:20Z

I think we need more discussion on this for the syntax of this

I also agree. Ideally, you want a solution that is both general and simple to implement. For instance, allowing lambdas that will be eval()uated at runtime

ickc · 2017-08-13T05:30:13Z

@reenberg, can you briefly describe what you want to accomplished exactly? i.e. let's forget about syntax and how to do it for the moment, but gives some small, before & after example on what you want to do. In particular, how you would want the regex to behave.

e.g. the simplest kind of filter will be 1,2,3,..... filtered to 1,2 only, extracting only the first 2 columns.

reenberg · 2017-08-22T10:35:08Z

My current issue is that I'm writing a document, where i have a spreadsheet of events.

This actually started out as a .csv file that i edited with a spreadsheet editor, but it has now evolved such that i found the need for using formulas (time calculations, column concatenation) and conditional formatting (to easily show groups of rows, etc) and thus it is now a .ods document, that I export to .csv.

The .csv file describes all the event data, such as type, start, end, various kinds of descriptions, work loads, etc.
I use this information to generate various pieces of information in my main document. One example is a table of specific event types and some of their descriptions.

Thus my need is specifically to be able to filter only some of the columns (e.g., 1,3,4,6,7) and then I also need to filter the rows, such that I only get the rows concerning the specific event types.

This has previously been delt with by some nasty LaTeX macros, that I just couldn't bother maintainer any more.

My initial implementation with the 'filter' and 'regex' properties, was just what came to my mind when coding it. However specifically I'm using the regex right now to easily filter out 'event', 'event2' and 'event3'. I use suffix numbering of the event type to have the events in different colours when generating some of the other overviews (think something like graphs)

* Changed column_filter to table_filter. * Changed the filter into a generator.

alerque · 2019-08-28T07:33:58Z

I'm accomplishing something similar using CSVKit, specifically csvcut to get just the columns I want in a preprocessing step before dumping the results into the markdown. There are quite a few tools with similar filtering capabilities including Python based ones. In general I think this workflow is better, I would be skeptical of putting a bunch of active code in the content of my data and would be skeptical of Pantable if it was trying to be a full fleged data manipulation tool rather than just a format conversion tool.

@reenberg Why do you think this should be implemented in Pantable itself?

ickc · 2019-08-28T08:00:33Z

The “pandoc way” to accomplish a task like this, without over bloating a filter, is to have another filter processing the filtering of the csv before pantable (ie piping a filter before pantable.) But inevitably this other filter before pantable has to be designed for pantable (eg which class to use.) So it is not strictly composable (ie not entirely independent of pantable.) So this hypothetical other filter is more like a pantable plugin, and hence may be why people want to put them together.

I think the solution you mention has to go through the shell (eg to me I’d use a makefile with an intermediate file chaining them together.) A solution like this is not universal. Also, a build like this makes the document less reproducible (in the sense that more details in how the document is built is needed.)

reenberg · 2019-08-28T18:31:42Z

Its always nice that someone cares, even if its just shy of 2 years since I left a reply to your comments @ickc.

To be honest, I don't think that I knew about CSVkit back then. And I guess I just fell victim of the classical "everything looks like a nail, when you have a hammer". I Don't think that my proposed changes does anything more than what can be achieved with a good combination of csvcut and csvgrep. So with that in mind this can ble discarded.

However I remember thinking that it was cleaner not having to setup an "elaborate build pipeline"/ Makefile to carve out all the intermediate files that I needed back then. It felt more smooth having the people writing the document and invoking pantable being able to just specify what data they needed from the csv file inside markdown. Not everyone is comfortable piping unix tools.

ickc and others added 30 commits November 19, 2016 16:09

makefile: revert change to style

8fbd85f

makefile: cleanup and improve html and pdf output

4472494

add travis deploy on pypi

e30555e

version: 0.0.5

7145051

makefile: fix bug in make README.md

5a7dc81

auto git tag and travis deploy to PyPI

42b9314

makefile: bug fix

0a30ccf

makefile: add git push in make pypi

50cad89

makefile: pypi: git push with current tag only

98ecb8e

makefile: improve make clean

89f4217

make test includes both pytest and pep8

78adc50

pantable.py: functions on options in-place, rather than reassign

7c6562f

pantable.py: functions on raw_table_list in-place, rather than reassign

5d26b50

makefile: add python linters and renamed some targets

572256b

pantable: improve init_table_options

b9c8780

pantable.py: improve check_table_options

85ee6a3

pantable: use list comprehension to append to lists

145a80d

make autopep8

b341e6a

passing pylint

508c39e

makefile: make test include pylint

b122309

bump version

48248ef

markdown defaulted to False instead

cd138d1

Because by default CSV tables shouldn’t contain markdown syntax

fix travis: install prerequisite of tests

700c197

make init: add tests requirements

de019c7

GitHub README: add badges

9aad3b5

version bumped

3fe63ea

travis: add coveralls

f18beb1

travis: pip install coveralls

0224791

travis: reorder coveralls installation

e0d74b7

Revert "travis: reorder coveralls installation"

89dbe21

This reverts commit e0d74b7.

ickc and others added 16 commits January 13, 2017 21:14

Merge pull request ickc#7 from ickc/python2

754032a

Python2 support

version bumped for python2 support

14a4554

travis: pypi: non-universal wheel (unpasteurized py3)

1404291

travis: install wheel for python2

669d062

travis: git reset to unpasteurized py3 dist

250647a

travis: cleanup and comments

6ec9d9d

travis: update pip only when not pypy3

7f639e5

travis: wheels built not using git hard reset

d2fb8a9

setup.py: reject panflute 1.10

bb44783

setup.py: reject panflute 1.9.7

59dd9dd

version bumped

dd16e18

travis: minor: escape regex

7b31f0a

setup.py: relaxed requirements on panflute versions

1f4cff0

because contaminated panflute 1.9.7 and 1.10 has been deleted.

remove to_bool: pyyaml already does this

5351aa4

.travis.yml: allow pypy3 failure

a9fc4e1

pip no longer support py3.2

ickc mentioned this pull request Aug 1, 2017

Filtering Subcells of CSV #17

Open

ickc mentioned this pull request Aug 2, 2017

Use another CSV parser? #21

Closed

Minor refactorings.

c645dae

* Changed column_filter to table_filter. * Changed the filter into a generator.

reenberg force-pushed the feature/columns-filter branch from 9c62628 to c645dae Compare October 12, 2017 08:04

ickc force-pushed the master branch 2 times, most recently from c51c4a4 to accb831 Compare November 10, 2020 01:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added column-filter functionality #16

Added column-filter functionality #16

reenberg commented Aug 1, 2017 •

edited

Loading

ickc commented Aug 1, 2017

ickc commented Aug 1, 2017

sergiocorreia commented Aug 4, 2017

ickc commented Aug 13, 2017

reenberg commented Aug 22, 2017

alerque commented Aug 28, 2019 •

edited

Loading

ickc commented Aug 28, 2019

reenberg commented Aug 28, 2019

Added column-filter functionality #16

Are you sure you want to change the base?

Added column-filter functionality #16

Conversation

reenberg commented Aug 1, 2017 • edited Loading

ickc commented Aug 1, 2017

ickc commented Aug 1, 2017

sergiocorreia commented Aug 4, 2017

ickc commented Aug 13, 2017

reenberg commented Aug 22, 2017

alerque commented Aug 28, 2019 • edited Loading

ickc commented Aug 28, 2019

reenberg commented Aug 28, 2019

reenberg commented Aug 1, 2017 •

edited

Loading

alerque commented Aug 28, 2019 •

edited

Loading