Skip to content
This repository

Allow file comments with genfromtxt(..., names=True) #351

Closed
wants to merge 2 commits into from

4 participants

Paul Natsuo Kishimoto Don't Add Me To Your Organization a.k.a The Travis Bot njsmith Charles Harris
Don't Add Me To Your Organization a.k.a The Travis Bot

This pull request fails (merged 9f45975 into 143fb18).

Paul Natsuo Kishimoto
khaeru commented July 12, 2012

Ah. OK, so this change will not work with the following:

# gender age weight
M   21  72.100000
F   35  58.330000
M   33  21.99

(using the table from test_commented_header). But it will work with any of the following:

gender age weight # these are the headers
M   21  72.100000
F   35  58.330000
M   33  21.99
# here is a general file comment
# it is spread over multiple lines
gender age weight
M   21  72.100000
F   35  58.330000
M   33  21.99
# here is a general file comment
# the columns in this table are:
gender age weight
# following this line are the data:
M   21  72.100000
F   35  58.330000
M   33  21.99

etc. If this is an acceptable trade-off, I can rewrite the test.

njsmith
Owner

So it sounds like there's a behavioural change here, where before we required header names to have a comment marker at the beginning and now we require them not to? Or something like that? Discussions about what behaviour we want belong on numpy-discussion, because lots of people who might be affected don't read PRs, so you should send a note there explaining the issue.

Paul Natsuo Kishimoto
khaeru commented July 13, 2012

OK, will do.

Don't Add Me To Your Organization a.k.a The Travis Bot

This pull request passes (merged 74e071e into 143fb18).

njsmith njsmith commented on the diff July 17, 2012
numpy/lib/npyio.py
@@ -1345,8 +1346,10 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
1345 1346
     try:
1346 1347
         while not first_values:
1347 1348
             first_line = fhd.next()
1348  
-            if names is True:
1349  
-                if comments in first_line:
  1349
+            if names is True and comments in first_line:
  1350
+                if skip_header == -1:
  1351
+                    first_line = first_line.split(comments)[0]
  1352
+                else:
1350 1353
                     first_line = asbytes('').join(first_line.split(comments)[1:])
1
njsmith Owner
njsmith added a note July 17, 2012

FYI -- .split takes a second optional argument. All of these should just become first_line.split(comments, 1), and the join parts can just go away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Charles Harris
Owner

Ping the travisbot.

Charles Harris charris closed this April 03, 2013
Charles Harris charris reopened this April 03, 2013
Charles Harris
Owner

I think this needs a test.

@njsmith As you noted, this seems to change behavior. I don't recall the discussion about that or if anything was decided.

njsmith
Owner

A quick skim of that mailing list thread suggests that this change would break several people's code. I didn't read enough to see if a more generally acceptable solution was found.

Charles Harris
Owner

Sounds to me like it should be closed then. I don't know why travisbot it trying to install this in 2.4, but I've seen that in a few other spots.

Charles Harris
Owner
charris commented May 11, 2013

Going to close this.

Charles Harris charris closed this May 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.

Showing 1 changed file with 13 additions and 10 deletions. Show diff stats Hide diff stats

  1. 23  numpy/lib/npyio.py
23  numpy/lib/npyio.py
@@ -1189,11 +1189,15 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
1189 1189
         Which columns to read, with 0 being the first.  For example,
1190 1190
         ``usecols = (1, 4, 5)`` will extract the 2nd, 5th and 6th columns.
1191 1191
     names : {None, True, str, sequence}, optional
1192  
-        If `names` is True, the field names are read from the first valid line
1193  
-        after the first `skip_header` lines.
1194  
-        If `names` is a sequence or a single-string of comma-separated names,
1195  
-        the names will be used to define the field names in a structured dtype.
1196  
-        If `names` is None, the names of the dtype fields will be used, if any.
  1192
+        Field names for structured dtype output. May be one of:
  1193
+
  1194
+          - True: field names are read from the first line after the initial
  1195
+            `skip_header` lines. If that line is commented and `skip_header` is
  1196
+            not -1, the portion *after* `comments` is used.
  1197
+          - None: field names from the `dtype` argument are used, if any.
  1198
+          - A sequence: field names are taken from the sequence.
  1199
+          - A string: comma-separated substrings are used as field names.
  1200
+
1197 1201
     excludelist : sequence, optional
1198 1202
         A list of names to exclude. This list is appended to the default list
1199 1203
         ['return','file','print']. Excluded names are appended an underscore:
@@ -1237,9 +1241,6 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
1237 1241
     -----
1238 1242
     * When spaces are used as delimiters, or when no delimiter has been given
1239 1243
       as input, there should not be any missing data between two fields.
1240  
-    * When the variables are named (either by a flexible dtype or with `names`,
1241  
-      there must not be any header in the file (else a ValueError
1242  
-      exception is raised).
1243 1244
     * Individual values are not stripped of spaces by default.
1244 1245
       When using a custom converter, make sure the function does remove spaces.
1245 1246
 
@@ -1345,8 +1346,10 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
1345 1346
     try:
1346 1347
         while not first_values:
1347 1348
             first_line = fhd.next()
1348  
-            if names is True:
1349  
-                if comments in first_line:
  1349
+            if names is True and comments in first_line:
  1350
+                if skip_header == -1:
  1351
+                    first_line = first_line.split(comments)[0]
  1352
+                else:
1350 1353
                     first_line = asbytes('').join(first_line.split(comments)[1:])
1351 1354
             first_values = split_line(first_line)
1352 1355
     except StopIteration:
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.