Skip to content

Big Data is latest and hugely demand in the market, all my works or scripts related to Bigdata are stored in this repository

Notifications You must be signed in to change notification settings

srihari4mbatech/BigData_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BigData_Learning

Big Data is latest and hugely demand in the market, all my works or scripts related to Bigdata are stored in this repository This has details of course information.

-> I am going through "Introduction of Big Data" This has 4 weeks of contents.

 In week1: Tutor is explaining about casestudies explaining the essence of bigdata.
 One of the casestudies is "Providing Precision Medicines to patient". This includes providing customized therapy to individual patient.
 There is one ppt on slideshare explaining Markevalues of Bigdata systems.
 The Types of sources of Bigdata, Data generated by People, Sensors and Stystems.
 Machines, people and origanization.

Big data is structured, semi-structured and unstructured data. Data Enable decisions

 Three main properties to call a device a smart
 * They can connect to other devices
 * Collect and analyze data anonymously
 * Provide environmental context
 Activity Trackers:
 * Distance run

Machines generate lot of data and they are the generators of bigdata.

    In-situ - Bring computation to data
    RDBMs- Take data to computational space and do the analysis.

Culture shift to Real-time processing

    Customer relationships
    Fraud Detection
    System monitoring/control
    
  Scalable computersystems
    AmazonWebservices
    Microsoft Azure

Generating unstructured data by people through youtube, instagram, twitter. Most of this data is unstructured data. In one data Facebook users generate huge data compared to US Academic data. This unstructured data gives lot of changes. Unstructured data is data which doesn't have datamodel. Humans generate lot of unstructured data.

Unstructured data text,images,videos and audios.

    Velocity is the amount of data generated in a particular amount of time is increasing rapidly by humans.
    Acquiring,storing, cleaning and processing of unstructured data is hellbent task. In the world 80 to 90% of data
    unstructured. </br>
    Advantages of Hadoop,storm,spark and Nosql.</br>
    Challenge is an opportunity.
    Hadoop is designed to support Bigdata. </br>
    Social media and Finance Market data or high velocity data. Unstructured data has no datamodel. </br>
    ETL doesn't work with Unstructured data.
    Neo4J is graph database. ( this is NOSql data)
    Keyvalue pair database Ex: Cassandra (this is NoSql Data)
    Crisismapers.net use </br>

Scale and speed is most important. Organization generated data
< color ='red'>

  • Commercial Transactions
  • Government Open Data
  • Sale Transaction Data

Highly structured data is generated by organizations. Spreadsheets are strucutred data.
Data model defines individual columns in the tables and whats the relationship between columns.
Many organizations have built their database individual to each department.
Cloud based solutions are agile and low capital based solutions in the area of organizational data analysis. Organizations must face attention in breaking up the silos of informaiton.
Organization generated data benefits come from adding othertypes of data.

In retail organisations Walmart uses bigdata heavily.250 Million customers and 10,000 stores. They utilise, twitter data, local events, local weather and in-store purchases and on-line clicks. Social media, Public events and General Media and Personal choices and on-line presence. For new product launching, Improving predictive analytics and customize recommendations.

By using bigdata companies are moving ahead in below sections.

  • Efficient operations
  • Higher scales
  • Improved Safety
  • Customer Satisfaction
  • Better Profit Margins
  • Improved Product Placement
Integrating Different Datasets
  • Carnival cruises case study is one example
    • Structured + Unstructured data + Price optimization = Increased Revenue
    Managing data and turning into something more knowledgeable.
    Data Integration process:
    1. Discovering
    2. Acessing
    3. Monitoring
    4. Modeling
    5. Transforming
Different types of Data:
1. Flat files
2. Relational data from databases
3. XML Files data
4. JSON Files data

After integrating all the data, we get richer data. Integration of Data from multiple sources decreases data complexity and increases the data usability. Increase data collaboration.

What's one surprising or uncomfortable thing you may be providing data on? Is there a non-social media (or shopping) application you realize you do give information to (perhaps that you hadn't thought of before)?

Answers to quiz

  1. Which of the following is an example of big data utilized in action today? Social Media
  2. What reasoning was given for the following: why is the "data storage to price ratio" relevant to big data?
  3. What is the best description of personalized marketing enabled by big data?
  4. Of the following, which is an example of personalized marketing related with big data? Google ordering ads to show items based on recent and past search results.
  5. What is the workflow for working with big data? Big Data -> Better Models -> Higher Precision
  6. Which is the most compelling reason why mobile advertising is related to big data?
  7. What are the three types of diverse data sources?Machine Data, Organizational Data, and People
  8. What is an example of machine data?Weather station sensor output.
  9. What is an example of organizational data?Disease data from Center for Disease Control.
  10. Of the three data sources, which is the hardest to implement and streamline into a model?People
  11. Which of the following summarizes the process of using data streams?Theory -> Models -> Precise Advice
  12. Where does the real value of big data often come from?
  13. What does it mean for a device to be "smart"? Collect data and services autonomously.
  14. What does the term "in situ" mean in the context of big data? Bringing the computation to the location of the data.
  15. Which of the following are reasons mentioned for why data generated by people are hard to process? Choose all that apply 1. Skilled people to analyze the data are hard to come by. 2. The velocity of the data is very high. 3. Very unstructured data.
  16. What is the purpose of retrieval and storage; pre-processing; and analysis in order to convert multiple data sources into valuable data? 1. To allow scalable analytical solutions to big data.
  17. Which of the following are benefits of organization-generated data? Choose all that apply. Exprect high velocity all are correct
  18. What are data silos and why are they bad?
  19. Which of the following are benefits of data integration? Choose all that apply.? Except Monitoring of data all are correct

About

Big Data is latest and hugely demand in the market, all my works or scripts related to Bigdata are stored in this repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages