Skip to content
This repository has been archived by the owner on Jul 23, 2023. It is now read-only.

Refactoring idea - data extraction library for FB json files #15

Closed
epogrebnyak opened this issue Aug 2, 2020 · 9 comments
Closed

Refactoring idea - data extraction library for FB json files #15

epogrebnyak opened this issue Aug 2, 2020 · 9 comments

Comments

@epogrebnyak
Copy link

Maybe it is worthwhile separate data extraction and visualisation functionality? The data extraction utility should accept the working directory with data and produce serialised data of friends, likes, etc. This part can be covered by unit tests.

@epogrebnyak
Copy link
Author

The visualisation part can work on results of data extraction, also should be useful to expose data extraction function to the user, so one can construct own visualizations.

@epogrebnyak
Copy link
Author

Something like below:

import json
import pandas as pd

def read_json(filename: str):
    with open(filename) as f:
        return json.load(f)

def get_timestamp(x: int):
    return pd.Timestamp(x, unit="s")

def decode(s: str):
    return s.encode('latin-1').decode("utf-8")

def get_friends_df(filename: str, key: str):      
    df = pd.DataFrame(read_json(filename)[key])
    df['name'] =  df['name'].map(decode) 
    df['timestamp'] =  df['timestamp'].map(get_timestamp) 
    return df

friends_df = get_friends_df("friends.json", "friends")

epogrebnyak added a commit to epogrebnyak/facebook-json-to-csv that referenced this issue Aug 3, 2020
@epogrebnyak
Copy link
Author

Got some progress here, maybe add more functionality?

изображение

https://github.com/epogrebnyak/facebook-json-to-csv/blob/master/friends.py

@itzmeanjan
Copy link
Owner

It's good suggestion, but in fviz data extraction & manipulation is done here, where as data visualisation is handled here.

@itzmeanjan
Copy link
Owner

Got some progress here, maybe add more functionality?

изображение

https://github.com/epogrebnyak/facebook-json-to-csv/blob/master/friends.py

Yes adding more functionality will be good.

@itzmeanjan
Copy link
Owner

I've already implemented lots of those data manipulation functionalities in fviz, you can take a look 1, 2

@epogrebnyak
Copy link
Author

epogrebnyak commented Aug 5, 2020

In fviz now everyhting is plugged into classes, make it hard to reuse, I think data acquisition should be separate from analysis (like here). Also the Comment, Post look better as strustures with final data, not holders of raw information.

In a script I wrote starts at providing folder and ends providing clean the data, no intent for plottng.

There good bits in you code, but they look hidden in classes, hard to reuse.

@itzmeanjan
Copy link
Owner

itzmeanjan commented Aug 5, 2020

In my opinion putting functions under certain class helps in keeping namespace clean, though of course makes it harder to find them. But they are placed under seperate hoods. Rather for reusability, I think I can improved API doc. What do you think ?

@epogrebnyak

@epogrebnyak
Copy link
Author

epogrebnyak commented Aug 5, 2020

@itzmeanjan depends on your approach - myself I find it cleaner to follow some kind of a pipeline with functions, it is often more testable too (easier to inject test parameters), less duplicate code. The parser part is folder -> funcs to find file and originate stream of values form JSON (getter) -> saving values. To extract something from a JSON one needs just about the following:

address_book = Getter(
        name="address_book",
        path=["about_you", "your_address_books.json"],
        unpack=lambda xs: xs["address_book"]["address_book"],
        elem=lambda x: (decode(x["name"]), extract_address_book_details(x)),
        columns=["name", "contact"],

then you can construct the filename from folder and address_book.path and apply [elem(x) for x in unpack(read_json(filename))]. This saves you creating extra two modules and two new classes for each piece of information (friends, messages, posts, comments, etc)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants