Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2018-web/data/talks/PC-53138.yaml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
38 lines (24 sloc)
2.28 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Talk details are specified in YAML files | |
| # YAML was selected because we can use multi-line strings and add | |
| # comments in the file. | |
| speaker_name: "Faisal Dosani" | |
| talk_title: "Open Sourcing at Work" | |
| # At least 1 tag is necessary!! | |
| talk_tags: | |
| - "open source" | |
| - "licensing" | |
| - "copyright" | |
| - "data" | |
| - "security" | |
| - "testing" | |
| - "best practices" | |
| - "data science" | |
| talk_abstract: "We just open sourced 2 projects (datacompy, and locopy) with roots in Data Science and Engineering which we will showcase. While is it exciting and rewarding to share your ideas with the world it isn't always easy. Thinking about licenses, copyrights, and protecting confidential information is a must!" | |
| talk_details: "Working in a large organization which is embracing the mantra 'open source first' is really exciting. Part of this journey is to make sure we give back to the open source community when we can. Two of our projects had gained traction internally: `datacompy`, and `locopy`. As part of our commitment we wanted to make sure we could open source these projects for others to use and contribute back to. | |
| DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pandas DataFrames with some more functionality than just Pandas.DataFrame.equals(Pandas.DataFrame) (in that it prints out some stats, and lets you tweak how accurate matches have to be). Then extended to carry that functionality over to Spark Dataframes. | |
| Locopy helps load flat files to S3 and then to Amazon Redshift, and assist with ETL processing. It is DB Driver (Adapter) agnostic, provides basic functionality to move data to S3 buckets, execute COPY commands to load data to S3, and into Redshift, and UNLOAD commands to unload data from Redshift into S3. | |
| While building these products was exciting and fun, some of the legal considerations were as interesting, complex, and required collaboration between many teams, from security, licensing, brand, and IP/copyright. We'll explore the projects, and some of these other considerations which can make or break if you decide to release a project into the wild, along with the road blocks we faced with in these areas." | |
| # Markdown is supported | |
| about_author: '' | |
| # web link will only show if about_author section is present | |
| author_website: '' |