Skip to content

mahailuo/pyspark_notes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

这是关于学习pyspark的一些笔记,参考资料《Python+Spark2.0+hadoop机器学习与大数据实战 林大贵著》,如有兴趣请自行搜索购买正版书籍! 请下载pyspark_learning_notes.ipynb,启用jupter notebook后可读取。 适合有一定python基础,pyspark入门者阅读!

说明:

1、本项目下的代码为jupter notebook,请确保自己已正确连接spark和hadoop
2、sc和spark实例已自动生成
3、本代码的运行环境是下Hadoop 2.7,spark 2.3

代码示例:

1、textfile=sc.textFile("file:/c:/f/test.txt")#本地文件注意路径书写格式file:
2、textfile=sc.textFile("hdfs:/hdfsfile/test/test.txt")#hdfs文件注意路径书写格式hdfs:
3、strRDD=sc.parallelize(list(('app','note','oran','app','grap')))

About

There are some notes when i am learning pyspark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published