Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2018-web/data/talks/PC-54320.yaml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
33 lines (21 sloc)
1.67 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Talk details are specified in YAML files | |
| # YAML was selected because we can use multi-line strings and add | |
| # comments in the file. | |
| speaker_name: "Abbas Taher" | |
| talk_title: "How to Aggregate Interest Data and User “Likes” using PySpark" | |
| # At least 1 tag is necessary!! | |
| talk_tags: | |
| - "Python" | |
| - "PySpark" | |
| - "Big Data" | |
| - "Marketing" | |
| - "Likes" | |
| - "Apache Spark" | |
| talk_abstract: "Today, Big Data is becoming an important component of IT in large organizations. The talk presents three methods to aggregate big data using Python dictionaries and the API of Apache Spark known as PySpark. The talk tries to simplify complex methods and presents them in a simple approach." | |
| talk_details: "Apache Spark is one the most popular Big Data frameworks and PySpark is the Python API for using Spark. PySpark is a great choice when you need to scale up your jobs to work with big data files. | |
| In this short overview, we shall present 3 methods to aggregate big data. First, we shall use Python dictionaries, then we shall present the two methods “GroupBy” and “ReduceBy” to do the same aggregation work using PySpark. The three approaches will be presented and explained using a Jupyter Notebook. | |
| " | |
| # Markdown is supported | |
| about_author: 'Abbas Taher comes with 25’ years experience in IT and business. During his career, he has founded multiple start-ups and worked in companies like Microsoft and Etisalat. Early 2018, he has founded GoFlek Inc. to build the next generation of machine learning engines using Python. He also works as a consultant to CN Rails and a member of the team integrating big data into their systems.' | |
| # web link will only show if about_author section is present | |
| author_website: '' |