Conversation
DARHWOLF
left a comment
There was a problem hiding this comment.
Nice work!
-
May I request the changes mentioned in the comments please.
-
Also, I see that the dataset is already available in
.csvformat. Was there any specific reason to convert it to.txt? -
Further, may I request you to clearly specify how this conversion was done, and how did you verify whether the contents of the datasets were not modified in the process.
README.md
Outdated
| 1. [example-causal-datasets](https://github.com/cmu-phil/example-causal-datasets): CC0 1.0 Universal. Last synced on 2026-02-05. | ||
|
|
||
| 3.[Twins-datasets](http://www.nber.org/data/linked-birth-infant-death-data-vital-statistics-data.html) | ||
| 2. [News-dataset] (https://www.fredjo.com/): Last downloaded on 2026-03-11. |
There was a problem hiding this comment.
this link is to the author's personal website. May I request you to find a better alternative here? Preferably a direct link to the webpage where this dataset can be accessed.
There was a problem hiding this comment.
Also please put the exact link to the dataset.
|
I see that there are multiple datasets in the zip file. Which dataset has been added in this PR? |
|
Thank you both for the advice! I've updated the links to the direct links to download the dataset.
I saw that #3 had a similar comment and decided to do the same for this.
Sure! The News dataset is split into .x and .y files, where .x contains a sparse representation of the features and .y contains treatment, y_factual, y_counterfactual, mu0, mu1. To make this similar to other datasets in this repository, I wrote a script to convert the sparse representation of .x into columns, then join this side-by-side with .y. The csv file is converted into txt by renaming the file to txt in Pandas
This is the first dataset, namely the topic_doc_mean_n5000_k3477_seed_1.csv.x and topic_doc_mean_n5000_k3477_seed_1.csv.y. Happy to find another way to add this dataset instead if required. |
The News dataset is one of the datasets suggested to add from pgmpy/pgmpy#2620 and solves the issue pgmpy/pgmpy#2787.
The dataset is the
topic_doc_mean_n5000_k3477_seed_1.csvdataset downloaded from https://www.fredjo.com/