Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to convert only a list of tags and leave the rest as plain text #267

Open
sikri-eic opened this issue Feb 21, 2022 · 4 comments

Comments

@sikri-eic
Copy link

I looked through the documentation and examples but couldn't find anything about this. I want to convert a handful of tags (<p>, <li>, <a>) to markdown and the rest to plain text. I was wondering if there is a filtering mechanism where:

  • I can specify the tags I want to convert to markdown
  • I can provide a format for convert <a> tag to (I want it to look like text (link)
@mysticmind
Copy link
Owner

mysticmind commented Feb 21, 2022

These are quite custom things and would suggest you to do a pre-processing step using HtmlAgilityPack to convert/process the required html nodes as per your requirements and then pass the resulting html for Markdown conversion.

If you look at the source code, you can learn how I am using HtmlAgilityPack internally.

@mysticmind
Copy link
Owner

mysticmind commented Feb 21, 2022

Quick follow on note, I think there is room to extend PassThroughTags to render as text with an additional option rather html. Let me have a look and revert.

@sikri-eic
Copy link
Author

Thank you for the prompt response. I started looking at the source, and I think it may be simpler to create a CustomConverter (which doesn't exist) using the converters that you have already implemented. In that case, Instead of finding all IConverter implementations and adding them to _converters dictionary, I would just add the ones that I want to use. But there is a problem with this approach, all classes implementing IConverter require a Converter in their constructor. If it was instead an interface, I could implement that interface. The interface could look something like:

public interface ITopLevelConverter // I know, bad name :)
{
	Config Config { get; };
	string Convert(string html);
	void Register(string tagName, IConverter converter);
	IConverter Lookup(string tagName);
}

@sikri-eic
Copy link
Author

I could hack something together using your code:

public class CustomConverter : ReverseMarkdown.Converter
{
	private readonly IDictionary<string, IConverter> _converters = new Dictionary<string, IConverter>();
	private readonly IConverter _innerTextConverter;
	public CustomConverter()
	{
		_converters["p"] = new P(this);
		_converters["li"] = new Li(this);
		_converters["ol"] = new Ol(this);

		_innerTextConverter = new InnerText(this);
	}

	public new string Convert(string html)
	{
		html = ReverseMarkdown.Cleaner.PreTidy(html, Config.RemoveComments);

		var doc = new HtmlDocument();
		doc.LoadHtml(html);

		var root = doc.DocumentNode;

		// ensure to start from body and ignore head etc
		if (root.Descendants("body").Any())
		{
			root = root.SelectSingleNode("//body");
		}

		var result = Lookup(root.Name).Convert(root);

		return result.Trim();
	}

	public new IConverter Lookup(string tagName)
	{
		return _converters.ContainsKey(tagName) ? _converters[tagName] : _innerTextConverter;
	}
}

As you can see this is not ideal (due to hiding members of the base class), but it seems to work. Do you think this would be an extension vector for the library? (BTW: Since this now works for me, I don't really need this to be implemented in the library.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants