-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix duplicated garbage #130
Conversation
Codecov Report
@@ Coverage Diff @@
## neat_report #130 +/- ##
===============================================
- Coverage 79.83% 79.78% -0.06%
===============================================
Files 22 22
Lines 1582 1583 +1
Branches 273 273
===============================================
Hits 1263 1263
Misses 275 275
- Partials 44 45 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## 0.3.6dev #130 +/- ##
============================================
- Coverage 79.81% 79.77% -0.04%
============================================
Files 22 22
Lines 1580 1582 +2
Branches 274 274
============================================
+ Hits 1261 1262 +1
Misses 275 275
- Partials 44 45 +1
Continue to review full report at Codecov.
|
I agree it's not crucial, but maybe, if possible, we can make it optional. This is because I think it depends on the customer if they need to export, let's say, to a csv file and we have to remove trailing whitespaces |
@andersonberg But nested fields are not exportable to csv, aren't they? |
Yes, you're right. So, yeah, we don't have to find whitespaces in nested data |
98ec029
to
837c0ab
Compare
Co-Authored-By: Adrián Chaves <adrian@chaves.io>
@Gallaecio Thanks for noting inconsistencies. |
Co-Authored-By: Adrián Chaves <adrian@chaves.io>
The pr does several things:
apply(str)
to search for patterns in nested dataapply(str)
makes strings from nested objects (lists, dicts and any combination), and the downside is that the current regex won't search traling\ending whitespaces in nested data:"['days inn lombard street hotel san francisco', 'lombard street days inn', 'days inn san francisco ', 'san francisco days inn hotel lombard st', 'san francisco day inn lombard street']"
Here
r"(?P<spaces>^\s|\s$)"
doesn't detect this whitespace infrancisco '
Personally, I don't think it's crucial to find those whitespaces in nested data, any comments?
Performance (now, before):
nested data
2k sample - 331ms vs 800ms
12k sample - 1.4s vs 1.6s (2.4s including flattening)
90k sample - 2:07m vs 1:31m (matt takes about 1:30m too)
not nested
10k sample - 560ms vs 600ms
apply.str()
than on larger number of flat columns.And here's how it looks:
![Screenshot 2019-07-01 at 12 58 00](https://user-images.githubusercontent.com/10396557/60456823-561e2b80-9c08-11e9-893c-2a989499648f.png)
@ivankivanov you will be interested in this one.